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With  the  advent  of  knowledge-based  Instructional  systems  that  can 
answer  trainees'  questions,  critique  their  hypotheses  and  automatically 
provide  remedial  hints,  the  need  for  a man-machine  interface  that 
facilitates  rather  than  hinders  a student's  communication  with  the  machine 
becomes  ever  more  pressing.  This  report  describes  a general  technique  for 
generating  "friendly",  efficient  and  robust  natural  language  front  ends  for 
advanced  Instructional  systems.  The  generality  of  this  technique  has  been 
proved  by  its  successful  application  in  a range  of  Instructional  systems; 
its  efficiency  has  turned  out  to  rival  the  keywords  parsers  which  underly 
most  of  the  classical  CAI  systems;  its  robustness  has  been  attested  to  by 
the  fact  that  it  has  been  able  to  handle  nearly  every  serious  query  posed 
to  our  electronic  instructional  systems  in  the  course  of  a lesson  or 
exercise. 

In  this  report  we  first  discuss  the  essential  properties  that  comprise 
a "friendly"  natural  language  front-end  for  an  Instructional  system.  Next, 
we  discuss  some  prior  systems  that  have  some,  but  not  all,  of  the  desired 
capabilities  and  then  we  focus  on  the  technical  details  underlying 
"semantic  grammars"  — a new  technique  for  producing  the  desired 
man-machine  interfaces.  Although  there  is  little  emphasis  placed  on  the 
analysis  of  how  students  used  the  capabilities  afforded  by  this  kind  of 
natural  language  Interface  (made  possible  by  semantic  grammars),  a 
companion  report  contains  the  analysis  of  nearly  twelve  thousand  natural 
language  Interactions  collected  from  students  using  Instructional  systems 
built  around  this  technique. 
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Chapter  1 


REQUIREMENTS  FOR  A NATURAL 
LANGUAGE  INTERFACE  FOR  INSTRUCTIONAL  SYSTEMS 

This  research  arose  from  the  need  for  natural  language  Interfaces  to 
complex  instructional  systems  which  underly  reactive  training  environments. 
As  used  here,  the  term  "reactive  training  environment"  refers  to  flexible 
problem  solving,  laboratory-like  situations  that  have  been  Implemented  on  a 
computer.  The  environment  is  reactive  in  the  sense  that  the  computer  can 
(in  addition  to  Implementing  the  laboratory)  monitor  the  student's 
activities  and  provide  tutorial  feedback  during  the  solution  of  problems. 
A characteristic  of  such  systems  is  that  the  computer-naive  students  are 
Involved  in  a training  situation  in  which  the  computer  is  merely  the 
medium.  Most  certainly  these  students  are  not  Interested  in  state-of-art 
man-machine  communication;  they  must  be  free  to  concentrate  on  solving 
their  problems  and  learning  from  their  solution  paths  and  errors. 

This  instructional  environment  places  constraints  on  a natural 
language  understanding  system  that  exceed  the  capabilities  of  all  existing 
systems.  These  constraints  Include:  (1)  efficiency  (2)  habitability  (3) 
self-teachability  and  (4)  the  ability  to  exist  with  ambiguity.  In  the 
remainder  of  this  chapter  we  will  explore  why  these  are  important,  and  then 
provide  an  overview  of  the  remainder  of  this  report. 

Requirements 

A primary  requirement  for  a natural  language  processor,  in  an 
instructional  situation,  is  speed.  Imagine  the  following  setting:  the 
student  is  at  a terminal  actively  working  on  a problem.  He  decides  that  he 
needs  another  piece  of  information  to  advance  his  solution,  so  he 
formulates  a query.  Once  he  has  finished  typing  his  question,  he  will  wait 
for  the  system  to  give  him  an  answer  before  he  continues  working  on  his 
solutions.  During  the  time  it  takes  the  system  to  parse  his  query,  the 
student  is  apt  to  forget  pertinent  information  and  lose  interest. 
Psychological  experiments  have  shown  that  response  delays  longer  than  two 
seconds  have  serious  effects  on  the  performance  of  complex  tasks  via 
terminals  (Miller  68).  In  these  two  seconds,  the  system  must  understand 
the  query;  deduce,  infer,  lookup  or  calculate  the  answer;  and  generate  a 


1 


response . ( 1 ) 


The  second  requirement  for  a natural  language  front-end  Is 

habitability.  Any  natural  language  system  written  In  the  forseeable  future 

is  not  going  to  be  able  to  understand  all  of  natural  language.  What  it 

must  do  is  characterize  and  understand  a useable,  subset  of  the  language. 

Watt  (1968  p.  338)  defines  a "habitable"  sub-language  as  "ono  In  which  Its 

users  can  express  themselves  without  straying  over  the  language  boundaries 

into  unallowed  sentences".  Very  intuitively,  for  a system  to  be  habitable 

It  must,  among  other  things,  allow  the  user  to  make  local  or  minor 

modifications  to  an  accepted  sentence  and  get  another  accepted  sentence. 

Exactly  how  much  modification  constitutes  a minor  change  has  never  been 

specified.  Some  examples  may  provide  more  insight  into  this  notion. 

!1)  Is  anytnlng  wrong? 

2)  Is  there  anything  wrong? 

3)  Is  there  something  wrong? 

5)  Is  there  anything  wrong  with  section  3? 

5)  Does  it  look  to  you  like  section  3 could  have  a problem? 

If  a problem  solving  system  accepts  sentence  1 , it  should  also  accept  the 
modifications  given  in  sentence  2 and  3.  Sentence  4 presents  a minor 
syntactic  extension  which  may  have  major  repercussions  in  the  semantics  but 
which  should  also  be  accepted.  Sentence  5 is  an  example  of  a possible 
paraphrase  of  sentence  4 which  is  beyond  the  intended  notion  of 
habitability.  Based  on  the  acceptance  of  sentences  1-4,  the  user  has  no 
reason  to  expect  that  sentence  5 will  be  handled. 

Any  sub-language  which  does  not  maintain  a high  degree  of  habitability 
is  apt  to  be  worse  than  no  natural  language  capability  at  all.  Because,  in 
addition  to  the  problem  he  is  seeking  information  about,  the  student  is 
faced,  sporadically,  with  the  problem  of  getting  the  system  to  understand 
his  query.  This  second  problem  can  be  disastrous  both  because  it  occurs 
seemingly  at  random  and  because  it  is  ill-defined.  In  an  informal 

experiment  to  test  the  habitability  of  a system,  the  authors  asked  a group 

of  four  students  to  write  down  as  many  ways  as  possible  of  asking  a 

(1)  Another  effect  of  poor  response  time  which  is  critical  Eo  intelligent 

monitoring  systems  is  that  more  of  the  student's  searching  for  the  answer 
is  done  internally  (i.e.  without  using  the  system).  This  decreases  the 
amount  of  information  the  tutoring  system  receives  and  increases  the  amount 
of  induction  that  must  be  performed,  making  the  problem  of  figuring  out 
what  the  student  is  doing  much  harder  (e.g.  the  student  won't  "show  his 
work"  when  solving  a problem;  he  will  just  present  the  answer). 
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particular  question.  The  original  idea  was  to  determine  how  many  of  the 
various  paraphrasing  would  be  accepted.  The  students  each  came  up  with  one 
phrasing  very  quickly  but  had  tremendous  difficulty  thinking  of  any  others, 
even  though  three  of  the  first  phraslngs  were  different!  This  experience 
demonstrates  the  lack  of  student's  ability  to  do  "linguistic"  problem 
solving  and  points  out  the  importance  of  accepting  the  student's  first 
phrasing. 

An  equally  important  aspect  of  the  habitability  problem  is  the 
multi-sentence  (or  dialogue)  phenomena.  When  students  use  a system  that 
exhibits  "Intelligence"  through  its  inference  capabilities,  they  quickly 
start  to  assume  that  the  system  must  also  be  intelligent  in  its 
conversational  abilities  as  well.  For  example,  they  will  frequently  delete 
parts  of  their  statements  which  they  feel  are  obvious,  given  the  context  of 
the  preceding  statements.  Often  they  are  totally  unaware  of  such  deletions 
and  show  surprise  and/or  anger  when  the  system  falls  to  utilize  contextual 
information  as  clearly  as  they  (subconsciously)  do.  The  use  of  context 
manifests  itself  in  the  use  of  such  linguistic  phenomena  as 
pronominalizations , anaphoric  deletions  and  ellipses.  The  following 
sequence  of  questions  exemplifies  these  problems: 

(6)  What  is  the  population  of  Los  Angeles? 

(7)  What  is  it  for  San  Francisco? 

(8)  What  about  San  Diego? 

The  third  requirement  for  a natural  language  processor  is  that  it  be 
self-teaching.  As  the  student  uses  the  system,  he  should  begin  to  feel  the 
range  and  limitations  of  the  sub-language.  When  the  student  uses  a 
sentence  that  the  system  can't  understand,  he  should  receive  feedback  that 
will  enable  him  to  determine  why  it  can't.  There  are  at  least  two  kinds  of 
feedback.  The  simplest  (and  most  often  seen)  merely  provides  some 
indication  of  what  parts  of  the  sentence  caused  the  problem  (e.g.  unknown 
word  or  phrase).  A more  useful  kind  of  feedback  goes  on  to  provide  a 
response  based  on  those  parts  of  the  sentence  that  did  make  sense  and  then 
Indicate  (or  give  examples  of)  possibly  related,  acceptable  sentences.  It 
may  even  be  advantageous  to  have  the  system  recognize  common  unacceptable 
sentences  and  in  response  to  them,  explain  why  they  are  not  in  the 
sub-language.  (See  chapter  6 for  further  discussion  of  this  point.) 
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The  fourth  requirement  for  a natural  language  system  is  that  it  be 
aware  of  ambiguity.  Natural  language  gains  a good  deal  of  flexibility  and 
power  by  not  forcing  every  meaning  into  a different  surface  structure. 
This  means  that  the  program  that  Interprets  natural  language  sentences 
must  be  aware  that  more  than  one  Interpretation  is  possible.  For  example, 
when  asked: 

(9)  Was  John  believed  to  have  been  shot  by  Fred? 
one  of  the  most  potentially  disastrous  responses  I-  "Yes".  The  user  may 
not  be  sure  whether  Fred  did  the  shooting  or  tne  believing  or  both.  More 
likely,  the  user,  being  unaware  of  any  ambiguity,  assumes  an  Interpretation 
that  may  be  different  than  the  system's.  If  the  system's  Interpretation  is 
different,  the  user  thinks  he  has  received  the  answer  to  his  query  when  in 
fact  he  has  received  the  answer  to  a completely  independent  query. 

Either  of  the  following  is  a much  better  response: 

(10)  Yes,  it  is  believed  that  Fred  shot  John. 

(11)  Yes,  Fred  believes  that  John  was  shot. 

The  system  need  not  necessarily  have  tremendous  disambiguation  skills,  but 
it  must  be  aware  that  mls-lnterpretationr  are  possible  and  inform  the  user 
of  its  interpretation . In  those  cases  where  the  system  makes  a mistake  the 
results  may  be  annoying  but  should  not  be  catastrophic. 

This  report  presents  the  development  of  a technique  that  we  have  named 
"semantic  grammars"  for  building  natural  language  processors  that  satisfy 
the  above  constraints.  Chapter  2 discusses  other  systems  which  attack  some 
of  these  problems.  Chapter  3 presents  a dialogue  from  the  "intelligent" 
CAI  system  SOPHIE,  that  we  used  to  refine  and  demonstrate  this  technique. 
This  dialogue  provides  concrete  examples  of  the  kinds  of  linguistic 
capabilities  that  can  be  achieved  using  semantic  grammars.  Chapter  4 
describes  semantic  grammar  as  it  first  evolved  in  SOPHIE,  and  points  out 
how  it  allows  semantic  information  to  be  used  to  handle  dialogue 
constructs,  and  to  allow  the  directed  ignoring  of  words  in  the  input. 
Chapter  5 discusses  the  limitations  that  were  encountered  in  the  evolution 
of  semantic  grammars  in  SOPHIE  as  the  range  of  sentences  was  increased  and 
how  these  might  be  overcome  by  using  a different  formalism  — augmented 
transition  networks  (ATN).  Chapter  5 also  reports  on  the  conversion  of  the 
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SOPHIE  semantic  grammar  to  an  ATN , and  the  extensions  to  the  ATN  formalism 
which  were  necessary  to  maintain  the  solutions  presented  in  chapter  4. 
Chapter  5 also  Includes  comparison  timings  between  the  two  versions  of  the 
natural  language  processor.  Chapter  6 describes  experiences  we  have  had 
with  SOPHIE,  and  presents  techniques  developed  to  handle  problems  in  the 
area  of  non-understood  sentences.  Chapter  7 suggests  directions  for  future 
work . 
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Chapter  2 
RELATED  SYSTEMS 


In  this  chapter  w e will  describe  a number  of  different  techniques  that 
have  evolved  from  research  In  the  area  of  natural  language  understanding  as 
applied  to  practical  tasks.  Our  purpose  is  to  describe  a set  of  techniques 
that  have  been  developed  to  handle  a natur' ’ language  input  throughout  a 
range  of  complexity.  me  also  seek  to  dispel  the  Idea  that  there  Is  a 
"natural  language"  as  It  applies  to  Inter  facing  to  computer  systems,  or 
that  there  exists  one  "best"  technique  for  every  application. 

KEYWORD  SCHEMES 

perhaps  th“  oldest  and  simplest  method  of  dealing  with  unrestricted 
natural  language  was  through  keyword  parsinc.  The  technique  was  introduced 
by  Weizenbaum  ( 1966a)  and  has  been  used  and  extended  by  others  (e. g.  , 
melzenbaum  1966b,  Brown  et  al . 1973,  Shapiro  et  al.  1976,  Colby  »t  al . 
1974).  Using  this  parsing  scheme,  an  Input  sentence  Is  searched  for  "key" 
words.  Each  keyword  Is  associat’d  with  a collection  of  patterns  that  are 
then  tested  against  the  complete  input.  If  a pattern  matches,  an  action 
associated  with  that  pattern  (typically  a reassembly  rule  which  constructs 
an  output  sentence  by  reassembling  pieces  of  input)  is  executed.  This 
action  represents  the  "meaning"  of  the  sentence  to  the  system  (i.e.  the 
sentence's  semantics). 

Keyword  analysis  schemes  have  the  advantage  of  being  fast  and  of 
allowing  the  user  great  freedom  of  expression  since  any  number  of 
extraneous  words  can  be  Included  as  long  as  the  keywords  appear.  A 
particular  parser  can  also  be  changed  easily  (by  adding  new  rules)  until 
such  time  as  the  rules  begin  Interacting,  at  which  point  it  Is  unclear 
which  rule  to  use.  When  interactions  do  begin  to  occur,  keywords  can  be 
assigned  an  "Importance"  number  and  the  rule  with  the  highest  number  can  be 
uspd.  However,  conflicts  may  still  arise  when  different  keywords  of  equal 
Importance  appear  In  the  same  sentence. 

Keyword  techniques  work  well  In  situations  where  the  actions  that  the 
system  wishes  to  take  in  response  to  a sentence  correspond  In  a simple  way 
to  the  words  (i.e.  the  concepts  are  not  typically  expressed  as  multiple 
word  phrases,  and  words  do  not  have  multiple  Interpretations ) . However, 
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they  are  weak  in  situations  in  which  concepts  are  complex  enough  to 
require  embedding  or  in  which  quantificatlon(2)  is  required,  since  their 
semantic  interpretation  is  essentially  one  level.  In  these  cases,  keyword 
patterns  become  more  cumbersome  and  inefficient  to  use  than  more  structural 
techniques.  For  example,  consider  the  sentence: 

(1)  I think  Q‘j  has  an  open  emitter  and  a shorted  base  collector  junction. 

To  recognize  this  sentence  requires  a very  detailed  keyword  pattern  which 
could  be  "keyed"  equally  well,  or  equally  poorly,  off  any  of  the  words: 
think,  Q5,  open,  emitter,  shorted,  base  or  collector.  The  main  falling  of 
the  keyword  technique  is  that  it's  Incapable  of  capturing  any  of  the 
structure  of  the  language  it  is  trying  to  characterize. 

PARRY 

PARRY  is  a ongoing  project  to  develop  a dialogue  system  that  simulates 
paranoid  benavlor  (Colby  1973.  Colby  et  al.  1974).  The  system  must  respond 
to  any  possible  question  and  must  "understand"  the  questions  well  enough  to 
exhibit  paranoid  behavior.  To  these  ends,  Colby  has  extended  the  keyword 
parsing  techniques  introduced  by  Welzenbaum  by  adding  a second  level  of 
matching.  After  a preprocessing  phase  collapses  compound  words, 
canonicallzes  similar  words,  performs  minor  spelling  correction  and  deletes 
unrecognized  words,  the  input  is  segmented  at  certain  keyword 
boundaries. ( 3)  Each  segment  is  then  matched  against  a collection  of 
segment  patterns.  The  resulting  list  of  recognized  segments  is  then 
matched  to  a collection  of  complex  patterns.  Patterns  have  reassembly 
rules  associated  with  them  that  construct  the  response. 

Two  Important  restrictions  that  should  be  placed  on  the  application  of 
keyword  schemes  to  avoid  mls-understandlngs  (i.e.  to  avoid  having  patterns 
apply  when  they  shouldn't)  have  arisen  from  Colby's  work.  One  is  that,  at 

(?)  Quantification  refers  to  the  problem  of  having  a noun  phrase  that  can 
range  over  a set  of  values,  e.g.  "some  cars  have  engines",  "all  cars  have 
engines".  One  of  the  problems  with  quantification  is  determining  the  scope 
of  the  quantification  with  respect  to  the  rest  of  the  sentence,  especially 
when  the  rest  of  the  sentence  contains  another  quantifier. 

(3)  The  fragmentation  technique  (which  is  critical  to  proper  operation)  was 
developed  by  Wilk3  working  in  machine  translation  (1973a,  1973b).  ”l'e  list 
of  segmentation  words  includes  punctuation  marks,  subju  :tlves, 
conjunctions  and  prepositions. 


- 7 - 


most,  one  element  should  be  Ignored  at  each  level  of  matching.  Segment 
matches  should  account  for  all  but  one  word.  Complex  patterns  should 
account  for  all  but  one  segment.  The  other  restriction  Is  that  patterns 
should  require  that  their  elements  occur  In  a particular  order.  The 
following  example  (from  Colby  et  al . 1974)  demonstrates  the  usefulness  of 
Ignoring  words  such  as  "well"  in  sentence  3.  and  the  importance  of  word 
order;  without  word  order  restrictions , any  pattern  that  matched  2 would 
also  match  3* 

(2)  Are  you  well? 

( 3)  Well , are  yoj? 

PARRY  has  demonstrated  the  capability  of  dealing  with  a relatively 
large  number  of  concepts  at  a shallow  level.  The  power  In  PARRY's  approach 
lies  in  Its  ability  to  tolerate  unknown  words.  As  mentioned,  this 
fuzziness  Is  implemented  by  allowing  the  deletion  of  single  elpments  from 
both  levels  of  matching.  Unfortunately  the  underlying  semantics  of  PARRY's 
task,  Indeed  the  goals  of  the  task  Itself,  are  vague,  which  makes 
attributes  such  as  scope  and  habitability  hard  to  evaluate.  Furthermore, 
the  two-level  pattern  matching  technique  lacks  the  precision  required  in  a 
problem  solving  situation  in  which  many  regularities  cannot  be  captured  by 
one-level  embedding. 


NLPQ 

Heldorn  ( 1972, 1974, 1975)  developed  an  automatic  programming  system 
called  NLPQ  which  allows  users  to  describe  simulation  problems  in  English. 
The  system  takes  an  English  partial  description  of  a problem  and  fits  it 
into  an  Internal  description  language,  building  pieces  of  the  problem. 
From  the  partial  Internal  description,  questions  are  generated  that  request 
missing  pieces  of  information.  When  the  description  is  complete,  the 
system  can  generate  a GPSS  program  or  an  English  description  of  the  model 
it  has  built  from  the  user's  description.  The  user  can  also  ask  questions 
about  the  present  model,  and  make  changes  and  additions  to  it.  The  English 
processing  is  done  using  augmented  phrase  structure  rules.  The  phrase 
structure  component  is  syntax-based  — it  looks  for  things  like  noun 
phrases  — with  semantic  restrictions  being  carried  along  in  features  that 
are  tested  in  conditions  on  the  phrase  structure  rules.  The  structure 
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building  augmentations  create  semantic/conceptual  network  structures, 
called  segments,  that  represent  the  semantics  of  the  phrase.  Much  of  the 
system's  success  appears  to  be  Its  close  match  between  the  structure  of 
segments  and  the  way  English  Is  used  to  describe  modelling  problems.  No 
Information  on  the  use  of  NLPQ  by  naive  users  has  been  published,  so  It  Is 
difficult  to  evaluate  the  system's  habitability. 

CONSTRUCT 

CONSTRUCT  is  a general  system  to  do  natural  language  processing 
developed  at  the  Institute  for  Mathematical  Studies  In  the  Social  Sciences 
at  Stanford  University  (Smith  et  al.  1974).  Its  major  application  is  in  a 
text-based,  question  answering  system  for  elementary  mathematics  (Smith, 
N.W.  1974).  The  system  answers  questions  such  as: 

(4)  Are  there  any  even  prime  numbers  that  are  greater  than  2? 

(5)  Is  the  sum  of  5 and  2 less  than  the  product  of  5 and  2 but  greater 
than  the  difference  of  5 and  2? 

The  semantic  basis  of  the  system  is  a collection  of  procedures  for 
generating  and  manipulating  sets  and  numbers.  The  semantics  of  question  4 
would  be  "are  there  any  elements  in  the  set  created  by  Intersecting  the  set 
of  even  numbers,  the  set  of  prime  numbers  and  the  set  of  numbers  greater 
than  2?"  As  all  of  the  sets  in  the  example  are  infinite,  the  procedures 
know  about  dealing  with  lntenslonal  as  well  as  extensional  descriptions  of 
sets. 

The  meaning  of  a sentence  Is  determined  by  the  following  process. 
First  a preprocess  phase  occurs  during  which  (1)  abbreviations  are 
expanded,  (2)  synonyms  are  canonlcallzed , (3)  compound  word  and  common 
phrases  are  collapsed  to  a single  word  representation , (4)  noise  words  are 
eliminated  and  (5)  each  word  Is  replaced  by  Its  lexical  category.  The 
Input  Is  then  parsed  with  a context-free  grammar  with  the  semantic 
interpretation  occurring  In  parallel  via  semantic  construction  functions 
associated  with  each  grammar  rule.  Whereas  this  procedure  is  clearly 
Inadequate  If  a traditional  syntactic  grammar  is  used  — no  reasonable 
semantic  function  could  be  associated  with  the  rule  S :=  NP  VP  — the 
CONSTRUCT  grammar  is  built  around  the  semantic  rules  using  categories  that 
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capture  concepts  in  the  application  domain.  For  example,  the  grammar 
contains  the  grammatical  category  SUBST  which  corresponds  to  the  semantic 
concept  of  a constructive  set.  This  cuts  across  traditional  category 
boundaries  as  seen  In  the  sentences  from  (Smith  et  al.  197*0: 

Is  2 a factor  of  9? 

How  many  factors  of  12  are  even? 

Give  me  the  factors  of  12  that  are  between  1 and  6 . 

The  underlined  portions  would  all  be  parsed  into  the  SUBST  category, 
although  their  traditional  categories  would  be  noun  phrase,  adjective,  and 
prepositional  phrase. 

RENDEZVOUS 

Codd  (1979)  Is  designing  a natural  language  system,  called  RENDEZVOUS, 
to  support  the  needs  of  casual  users  of  data  bases.  One  problem  that  Codd 
has  addressed,  which  has  been  neglected  In  previous  systems,  Is  what  action 
to  take  if  a user's  query  Is  beyond  the  restricted  language  unU“rstood  by 
the  system.  A central  notion  to  Codd 's  proposed  solution  to  this  problem 
is  that  of  a "clarification  dialogue"  — a system  Initiated  dialogue  that 
includes  queries  about  an  unacceptable  utterance  that  attempts  to  arrive  at 
the  user's  meaning.  Codd  points  out  that  a clarification  dialogue  must  be 
embarked  upon  very  carefully.  For  example,  if  the  system  encounters  the 
unknown  word  "concerning",  one  of  the  worst  possible  responses  is  "What  do 
you  mean  by  the  word  'concerning'?"  Almost  any  response  to  such  a question 
would  be  beyond  the  capabilities  of  the  system.  Any  clarification  dialogue 
must  be  of  "bounded  scope"  and  guided  by  those  parts  of  the  query  which  the 
system  can  understand.  RENDEZVOUS  also  employs  re-statement  of  a user's 
query  to  confirm  the  Intent  of  the  query  and  to  point  out  ambiguities.  The 
range  of  language  accepted  by  RENDEZVOUS,  Indeed  even  the  method  used  to 
extend  the  range,  is  unclear.  The  aspect  of  RENDEZVOUS  that  Is  of  Interest 
here  Is  the  extent  to  which  It  has  been  designed  as  a "friendly"  system. 

LUNAR 

The  LUNAR  system  (Woods  1973a;  Woods  et  al.  1972)  Is  a natural 
language  understanding  Implementation  that  combines  a general  semantic 
Interpretation  mechanism  (Woods  1967,1968)  with  a large  scale  grammar  of 
English  (Woods  1970;  Woods  et  al . 1972).  LUNAR  was  designed  to  allow  a 
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lunar  geologist  to  use  English  to  query  the  chemical  analysis  data 
collected  from  the  moon  missions.  Typical  questions  the  system  answers 
are : 

What  Is  the  average  concentration  of  aluminium  In  high  alkali  rocks? 

Which  samples  have  greater  than  20%  modal  Plagioclase? 

The  processing  of  a query  occurs  In  three  major  phases.  During  the 
first  phase,  the  syntactic  component  derives  the  "deep  structure"  of  the 
sentence. (9)  The  syntactic  component  uses  a general  transformational 
grammar  of  English  syntax  expressed  as  an  augmented  transition  network  (see 
Chapter  5).  In  the  second  phase  a general,  rule-driven  semantic 
Interpretation  procedure  produces  the  representation  of  the  meaning  of  the 
sentence  as  a program  In  a formal  retrieval  language. (5)  The  semantic 
Interpretation  rules  are  tree-structured  pattern-matching  rules  that  are 
used  in  groups  to  extract  the  meaning  of  different  pieces  of  the  syntax 
tree.  The  third  phase  is  the  execution  of  the  formal  expression  to  produce 
the  answer  to  the  request.  The  formal  query  language  is  a generalization 
of  the  predicate  calculus  that  has  been  carefully  designed  to  allow  natural 
translation  from  English.  The  strength  of  the  LUNAR  system  lies  In  Its 
mechanisms  to  deal  with  quantification,  conjunction,  and  relative  clauses, 
and  these  are  direct  results  of  the  carefully  designed  formal  query 
language . 

Discussion 

The  notion  of  an  augmented  phrase  structure  grammar  provides  a useful 
base  for  comparison  between  these  systems. (6)  An  augmented  phrase 
structure  grammar  contains  two  components.  One  is  a set  of  context-free 
phrase  structure  rules.  The  other  Is  a corresponding  set  of  functions, 

(4)  This  Is  the  linguistic  deep  structure  hypothesized  by  Chomsky  ( Chomsky 
1965)  which  has  a central  role  in  the  theory  of  transformational  grammar. 

(5)  The  notion  that  the  meaning  of  a sentence  is  a program  is  generally 
called  "procedural  semantics".  Procedural  semantics  Is  In  general  use  for 
question  answering  applications.  It  does  not,  however,  constitute  a 
complete  theory  of  meaning.  In  particular  It  does  not  account  for  such 
phenomena  as  declaratives,  uses  of  temporal  references,  and  belief 
structures . 

(6)  The  Idea  of  associating  additional  information  with  a phrase  structure 
grammar  has  appeared  In  various  forms  since  early  compiling  systems  (Irons 
1961). 


sometimes  arbitrary,  sometimes  restricted , augmenting  each  of  the  rules 
that  can  be  used  to  block  the  application  of  the  context-free  rules  and  to 
maintain  structures.  While  the  paradigm  of  augmenting  phrase  structure 
grammars  Is  followed  by  a large  number  of  natural  language  systems, 
Important  differences  exist  with  respect  to  what  type  of  information  Is 
encoded  In  the  grammar.  For  example,  the  LUNAR  system  uses  a purely 
syntactic  grammar(7)  and  uses  the  augments  to  perform  syntactic  operations 
such  as  subject-verb  agreement  and  to  maintain  the  structure  of  the 
syntactic  tree.  NLPQ  uses  a syntactic  grammar  restricted  by  usually 
semantic  features  and  uses  the  augments  to  perform  parallel  semantic 
interpretation.  CONSTRUCT  performs  the  semantic  interpretation  in  parallel 
with  a set  of  context-free  rules  that  are  semantically  oriented.  PARRY's 
patterns,  if  viewed  as  limited,  phrase-structure  grammar  rules,  are 
directly  linked  to  the  semantics  of  the  system.  The  decision  about  how 
much  semantic  information  to  encode  in  the  grammar  is  a trade-off  between 
efficiency  and  generality.  Each  of  the  systems  presented  here  represents  a 
defensible  position  along  this  spectrum. 

When  we  began  developing  the  SOPHIE  system(8)  we  explored  the 
possibility  of  using,  intact,  the  syntactic  component  of  the  LUNAR  system. 
Since  the  LUNAR  syntactic  component  was  building  a linguistically  motivated 
description,  as  opposed  to  the  task  oriented  descriptions  being  built  by 
the  other  systems,  we  felt  its  transferability  to  other  domains  would  be 
high.  We  found  the  grammar  to  be  very  adequate,  parsing  many  of  the  most 
complicated  sentences  we  felt  SOPHIE  would  ever  need  to  understand. 
Unfortunately,  on  simple  sentences  it  provided  more  information  about  the 
sentence  than  we  needed.  For  example,  tense  information  was  seldom  needed 
and  in  those  cases  where  needed,  it  could  be  extracted  from  the 
relationships  between  concepts.  The  quantification  and  relative  clause 
mechanisms  were  oriented  towards  Woods'  formal  query  language  which  was  not 

(V)  The  augmented  transition  network  Is  an  extension  oT  a recursive 
transition'  network  that  has  the  power  of  a phrase  structure  grammar.  For 
this  reason  we  can  classify  it  here  as  using  an  augmented  phrase  structure 
grammar.  We  will  argue  later  that  the  transition  network  has  conceptual 
advantages  over  phrase  structure  rules,  but  this  does  not  affect  this 
discussion  whlcn  points  out  the  difference  in  the  kind  of  information 
captured  in  the  grammar. 

(8)  A Sophisticated  Instructional  Environment  for  teaching,  electronic 
troubleshooting.  Chapter  3 provides  examples  of  SOPHIE's  language 
requirements. 
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natural  for  our  use.  The  use  of  conjunction  In  our  domain  Is 
straightforward  and  relatively  predictable,  unlike  Its  use  in  the  LUNAR 
domain.  All  In  all  we  had  the  feeling  of  using  a microscope  when  we  only 
needed  a magnifying  glass!  The  underlying  semantic  structure  of  our  system 
could  not  take  advantage  of  such  detail.  Added  detail  is  acceptable  (it 
can  always  be  Ignored)  except  that  the  perception  of  such  detail  takes 
time,  which  is  a scarce  commodity.  The  LUNAR  system  was  taking  2 or  3 
seconds  to  syntactically  parse  a sentence  and  another  5 to  semantically 
Interpret  It.  This  experience  led  us  to  explore  ways  in  which  the 
semantics  of  the  system  could  be  used  to  speed  the  understanding  process. 

The  technique  we  developed  (described  in  Chapter  4)  has  much  In  common 
with  both  NLPQ  and  CONSTRUCT.  However,  significant  differences  arise  from 
the  emphasis  we  have  placed  on  dealing  with  dialogues,  and  on  the 
construction  of  a friendly  system.  This  has  caused  us  to  exploit  two  uses 
of  semantics  (during  parsing)  not  found  In  these  other  systems.  One  Is  the 
Insight  provided  Into  the  nature  of  ellipsis  and  deletion  In  dialogues. 
The  other  Is  the  basis  provided  for  characterizing  a habitable  language. 
In  Chapter  4,  we  shall  discuss  our  concept  of  a semantic  grammar  and  how  it 
allows  exploitation  of  these  two  advantages.  Before  we  get  Into  the 
details  of  how  this  is  accomplished,  we  present  in  the  next  chapter  an 
example  of  what  has  been  accomplished. 
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Chapter  3 
SAMPLE  DIALOGUE 


Before  delving  Into  the  structural  aspects  and  technical  details  of 

the  semantic  grammar  technique,  wp  would  first  like  to  provide  a concrete 

pxamplp  of  thp  dlalogups  It  has  supportPd.  This  chapter  prpspnts  an 

annotatPd  dialogup  of  a studpnt  using  thp  "Intelligent"  CAI  systpm 

SOPHIE. (9)  SOPHIE  was  dpvploppd  to  pxplorp  thp  usp  of  artificial 

IntPl ligpncp  tpchnlqups  in  providing  tutorial  fppdback  to  studpnts  pngagpd 

in  problpm  solving  activltlps.  Thp  particular  problpm  solving  activity 

that  SOPHIE  is  concprnpd  with  is  thp  troublpshooting  of  a malfunctioning 

pipcp  of  plpctronic  pqulpmpnt.  SOPHIE  modpls  thp  pIpcp  of  pquipmpnt  and 

answprs  thp  studpnt 's  rpqupsts  for  mpasurpmpnts  and  othpr  Information  to 

aid  him  in  dpbugglng  thp  pquipmpnt.  Morp  important,  throughout  thp  problpm 

solving  spssion,  SOPHIE  can  pvaluatp  thp  logical  conslstpncy  of  a student's 

nypothpsis  or  gpnpratp  hypothosps  which  arp  consistpnt  with  thp  bPhavior 

thp  studpnt  has  thus  far  obsprvpd. ( 10)  In  thp  dialogup,  thp  studpnt 's 

typing  is  undprlinsd.  Evpn  though  thp  dialogup  deals  with  electronic 

jargon,  the  linguistic  issues  it  exemplifies  occur  in  all  domains.  The 

annotations  (lower  case,  indented)  attempt  to  point  out  these  problems  and 

should  be  understandable  to  the  non-electronics  oriented  reader. 

WELCOME  TO  SOPHIE  - A SIMULATED  ELECTRONICS  LABORATORY. 

The  circuit  (Figure  3.1)  is  based  on  the  Heathklt  IP-28  power  supply. 
The  IP-28  is  a reasonably  sophisticated  power  supply  with  both  current 
limiting  and  voltage  limiting  behavior.  These  two  interrelated 
feedback  loops  make  troubleshooting  this  circuit  non-trlvial. 

>>1NSERT  A FAULT 

The  student  tells  SOPHIE  to  give  him  a fault  which  he  can 
troubleshoot.  SOPHIE  randomly  selects  a fault,  inserts  it  into  a model 
of  the  instrument  and  tells  the  studpnt  how  the  front  panel  controls 
are  presently  set. 

THE  INSTRUMENT  HAS  A FAULT  AND  ITS  PRESENT  CONTROL  SETTINGS  ARE: 

CC  1.0  CURRENT  CONTROL-FROM  0.0  (MIN  CURRENT)  TO  1.0 

CR  HIGH  CURRENT  RANGE  SWITCH,  HIGH:  1 AMP,  L0W=.1  AMP 

LOAD  1000  LOAD  RESISTANCE  IN  OHMS 

VC  1.0  VOLTAGE  CONTROL-FROM  0.0  (MIN  VOLTAGE)  TO  1.0 

VR  HIGH  VOLTAGE  RANGE  SWITCH,  HIGH=30,  LOW: 10  VOLTS 


IT)  TEe  dialogue  Ts  intended  To  demonstrate  SUpRlE  s linguistic 
capabilities  and.  while  it  touches  upon  the  major  features  of  SOPHIE,  it  is 
not  meant  to  exhibit  the  logical  or  deductive  capabilities  the  system. 

(10)  The  reader  is  encouraged  to  see  (Brown  and  Burton  1975)  for  further 
examples  and  descriptions  of  SOPHIE'3  tutorial  and  inferential 
capabilities. 
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>''WHAT  IS  THE  OUTPT? 

o litpt  =s  output 

THE  OUTPUT  VOLTAGE  IS  1.2  VOLTS. 

SOPHIE  corrects  the  student's  typing  mistake. 

Notice  that  "the  output"  was  disambiguated  as  the  output  voltage. 

IF  YOU  ARE  NOT  SURE  WHETHER  A VALUE  IS  CORRECT,  YOU  CAN  ASK  "IS  THAT 
VALUE  CORRECT?" 

The  system  points  out  a feature  that  the  student  may  not  otherwise  be 
aware  of.  "Correct"  here  means  what  it  is  in  a working  instrument. 

>>IS  THAT  RIGHT 

IN  A 'WORKING  CIRCUIT  THE  OUTPUT  VOLTAGE  IS  30.1  VOLTS. 

THAT  VALUE  IS  INCORRECT. 

"That"  here  refers  to  the  value  of  the  previous  measurement.  SOPHIE, 
in  addition  to  the  faulted  instrument,  has  a working  circuit  which  can 
be  used  to  take  reference  measurements. 


>>WH AT  IS  THE  VOLTAGE  BETWEEN  THE  CURRENT  LIMITING  TRANSISTOR 

...AND  THE  CON  ST  ANT  CURHENT  SOURCE'? 

THE  VOLTAGE  AT  N4  IS  2.UH  VOLTS. 


In  addition  to  specific  part  names,  SOPHIE  recognizes  references  to 
parts  by  their-  section.  Notice  that  although  the  sentence  says 
"voltage  between  x and  y",  it  really  means  "voltage  at  z with  respect 
to  ground".  The  output  routines  rephrase  the  input  in  a simpler  form 
(i.e.  between  x and  y reduces  to  N4) . 

> > WH AT  IS  IT  BETWEEN  THERE  AND  THE  BASE  OF  Q6 

THE  VOLTAGE  HfTWETEFl'TiU  AND  NE  IS'  T"28"V0TTST" 

Her-e  "it"  refers  to  "the  voltage"  and  "there"  refers  to  N4. 

Also  notice  that  "between"  has  to  be  treated  differently, 
i.e.  two  nodes  are  explicit  now. 

>>IG  THERE  ANY  RIPPLE  IN  THE  VOLTAGE  AT  THE  BASE  OF  Q4? 

rv  o.onrr;'  i r.o  nut  handle  ao  oh  ah  alter  ievitto. 


SOPHIE  recognizes  that  the  student  is  moving  into  an  area  that  it  is 
unable  to  handle.  This  is  currently  done  by  keyword  recognition  (e.g. 
ripple)  but  points  out  a technique  that  should  be  incorporated  in 
other-  aspects  of  the  system  as  well. 


>>WHAT  IS  THE  BASE  CURRENT  OF  03 

the-  ftcde  tuftott of'OR  is  1?  mjt 


ROAMPS. 


SOPHIE  can  answer  questions  about  current , resistance,  and  power- 
dissipation  as  well  as  voltage.  The  output  and  input  routines  know 
about  unit  prefixes. 

>>THRU  THE  EMITTER? 

THE  EMITTER  CURRENT  OF  Q3  IS  -.26  MILLIAMPS. 

This  is  a type  of  sentence  fragment  that  occurs  frequently  in 
dialogues  that  we  will  refer-  to  as  an  ellipsis.  Rather  than 
specifying  a complete  thought,  an  ellipsis  only  gives  that  piece  of 
tne  thought  which  contrasts  an  earlier-  thought.  Notice  that  the 
r-efer-ent  cannot  be  determined  on  syntactic  grounds  (as  is  possible 
with  the  ellipsis  "the  emitter  current":  yet  this  is  the  sort  of 
statement  a student  might  make  if  th®  last  answer  had  caused  a slight 
shifting  of  his  thinking. 


>>C0ULD  THE  PROBLEM  BE  THAT 

PLEACH  HE  MURE 


BLEM  BE  THAT  0 

OPEUlf TO  ABOUT 


5 IS  BAD? 

HOW  QE  IS  FAULTED. 
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The  student  develops  an  Idea  about  what  Is  wrong  and  asks  to  have  it 
evaluated . 


>>COULD  IT  BE  SHORTED? 

THE  SHORTED  MODES  OF  A TRANSISTOR  CURRENTLY  IMPLEMENTED  ARE:  ALL 
JUNCTIONS  SHORTED  OR  ONE  JUNCTION  SH  TED  WITH  THE  OPPOSITE 
TERMINAL  MODELLED  OPEN. 

A hypothesis  must  be  completely  (or  non-amblguously ) specified  before 
evaluation  can  take  place.  Even  though  the  last  statement  generated 
an  error  condition,  it  is  still  used  to  determine  possible  pronoun 
referents. 

SHOULD  Q5  HAVE  ALL  JUNCTIONS  SHORTED? 

> NO 

WHTTH  TERMINAL  SHOULD  BE  MODELLED  OPEN? 

> THE  BASE 

OK,  I’LL  LET  Q5  HAVE  ITS  EMITTER  COLLECTOR  JUNCTION  SHORTED  AND  ITS  BASE  OPEN 

At  this  point,  SOPHIE  looks  back  over  the  student's  measurements  to 
determine  if  he  should  have  "known  better".  See  (Brown  and  Burton 
1975)  or  (Brown  et  al.  1975)  for  further  description. 

HMMM , LET  ME  COMPUTE  ON  THAT  FOR  A MINUTE. 

When  the  system  knows  there  maybe  a long  wait,  a "delaying"  message  is 
printed  to  let  the  student  know  something  is  happening. 

WHEN  THE  SETTINGS  WERE: 

CC=1.0,  CR=HIGH,  LOAD: 1 000 , VC=1.0,  VR:HIGH. 

THAT  FAULT  CONFLICTS  WITH  THE  FOLLOWING  MEASUREMENT. 

YOU  OBSERVED  THAT  THE  VOLTAGE  BETWEEN  N9  AND  N5  WAS  1.28  VOLTS. 

IF  Q5  HAD  ITS  EC  SHORTED  AND  ITS  BASE  OPEN  IT  WOULD  BE  1 . 1 MILLIVOLTS. 

IN  A WORKING  CIRCUIT  IT  IS  1.93  VOLTS. 

BUT  THAT  FAULT  DOES  EXPLAIN  THE  VALUES  YOU  OBSERVED  FOR 

THE  OUTPUT  VOLTAGE.  THE  VOLTAGE  AT  N9,  THE  VOLTAGE  BETWEEN  N9  AND  N10, 

THE  EMITTER  CURRENT  OF  03  AND  THE  BASE  CURRENT  OF  Q3. 

> >WH AT  COULD  BE  WRONG 

The  student  cannot  go  further  and  asks  for  assistance.  SOPHIE 
proposes  some  possible  hypotheses  for  the  student  to  explore. 

THERE  ARE  LOTS  OF  FAULTS  WHICH  WOULD  EXPLAIN  THE  MEASUREMENTS  YOU 
HAVE  TAKEN.  HERE  ARE  SOME  POSSIBILITIES: 

THE  RESISTANCE  OF  R8  CHANGES 
THE  LEAKAGE  RESISTANCE  OF  C9  CHANGES 
THE  BETA  OF  Q3  CHANGES 
R 15  IS  OPEN 
06  IS  FAULTED 

>>IF  R8  WERE  ROK . WHAT  WOULD  THE  OUTPUT  VOLTAGE  BE? 

IF  THE  VALUE"  OF  ^8  WERE  30000 

UNDER  THE  PRESENT  SETTINGS,  THE  OUTPUT  VOLTAGE  WOULD  BE  .909  VOLTS. 

IN  A WORKING  CIRCUIT  IT  IS  30.1  VOLTS. 

The  student  doesn't  believe  one  of  the  possibilities  so  he  checks  it 
using  a hypothetical  statement.  Even  though  the  voltage  in  this  case 
and  that  in  the  case  of  a faulted  circuit  are  not  the  same,  they 
qualitatively  agree. 


>>  REP  LACE  Qb 

Th®  student  g u®ss®s  mayb®  Q6  is  bad  and  trios  to  r®plac®  it.  When  tbo 
studont  r®plac®s  a part  SOPHIE  qu®ri®s  him  to  insist  that  ho  know  how 
it  is  faultod. 

I AM  GOING  TO  ASK  YOU  SOME  QUESTIONS  ABOUT  HOW  Qo  IS  FAULTED. 

ARE  ANY  JUNCTIONS  SHORTED? 

> YES 

WHTTTT  JUNCTIONS  ARE  SHORTED? 

> ALL 


If  tho  rosponso  to  a quory  is  not  undorstood,  furthor  spoclfication  of 
tho  ®xp®ct®d  rosponso  is  glvon. 


PLEASE  SPECIFY  THE  JUNCTIONS  AS  EC  EB  AND  CB  (SEPARATED  BY  SPACES). 

WHICH  JUNCTIONS  ARE  SHORTED? 

> EC  EB  AND  CB 

Qb  IS  NOT  FAULTED  IN  THAT  MANNER. 

Tho  part  would  bo  roplacod  if  any  of  th®  junctions  had  b®®n  shortod. 
SOPHIE  would  normally  havo  critiquod  tho  studont 's  hypothosizod  fault. 
This  was  not  includod  in  tho  dialogu®  to  sav®  spac®. 

Q6  NOT  REPLACED. 

>> VOLTAGE  DROP  ACROSS  R5 

THE  70LTAGE  XCTOSS  R5  IS  -.028  MILLIVOLTS. 


Som»  noun  phras®  utt®ranc®s  ar®  rocognlz®d  as  qu®stions. 
> >WH AT  ABOUT  R3? 

THF  VCTLTlfiE  'AFrUSS  R3  IS  71.6  VOLTS. 


Anoth®r  ®llipsis.  Th®  contoxt  of  th®  provious  qu®stion  is  us®d  to 
d®t®rmin®  "what"  about  R3. 


>>REPLACE  R3 

is  "ttt  Speito 

> IT  IS  OPEN 
R <'  FFTL'AFtD' 


R SHORTED  OR  DOES  IT  HAVE  THE  WRONG  VALUE? 


>,-!■:  THERE  ANYTHING  ELSE  WRONG 
NO 


Aft®r  fixing  th®  fault,  th®  studont  us®s  SOPHIE  to  oxploro  oth®r 
aspocts  of  th®  circuit's  b®havlor. 

...  WERE  LEAKY,  mili AT  WuuLD  THE  VOLTAGE  ACROSS  IT  BE? 

Th®  r-of®r®nt  to  "it"  in  this  cas®  occurs  ®arlior  in  th®  same  s®nt«nc®. 


IS  IT  OKnf  IF  I USE  10000  FOR  THE  LEAKAGE  RESISTANCE  OF  C2? 

> YE. 

IFTHE  LEAKAGE  RESISTANCE  OF  C2  WERE  10000 

UNDER  THE  PRESENT  SETTINGS,  THE  VOLTAGE  ACROSS  C2  WOULD  BE  26.3  VOLTS. 
IN  A WORKING  CIRCUIT  IT  IS  28.9  VOLTS. 


>>WHAT  HAPPENS  IF  Cl  SHORTS? 

UNDER  THE  SETTINGS: 

CC=1.0,  CR=HIGH , LOAD: 1 000 , VC=1.0,  VR:HIGH , 
IF  Cl  WERE  SHORTED,  D3  WOULD  BECOME  OVERLOADED. 


In  th®  cas®  wh®r®  nothing  mor®  intorosting  happons,  "what  happons" 
qupstlons  d®fault  to  th®  output  voltag®. 


>>1R 


HAS  AN  OPEN  EMITTER  AND  A SHORTED  BAE  COLLECTOR  JUNCTION 


FrcppErcrTfj 


TUT 


BETWFETITS  BASE  AND  THE  JUNCTION  OF 
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b- • • ‘hht<)(^TAGE  SECTION  m THE  VQLTGEA  REFERENCE  SOURCE? 

VOLTGEA  - > VOLTAGE 

Several  examples  of  the  use  of  conjunction.  Imagine  the  student's 
frustration  If  he  la  forced  to  type  this  query  over  because  of 
spelling  mistakes. 

IF  Q6  HAD  ITS  BASE  COLLECTOR  JUNCTION  SHORTED  AND  ITS  EMITTER  OPEN, 

UNDER  THE  PRESENT  SETTINGS: 

CC=1.0.  CRrHIGH,  LOAD: 1000,  VC:1.0.  VR=HIGH 
THE  VOLTAGE  BETWEEN  N5  AND  N 1 3 WOULD  BE  THE  SAME,  I.E.  28.9  VOLTS. 

A hypothetical  question  Is  first  tried  under  the  present  settings.  If 
no  difference  Is  noted,  other  settings  are  tried. 

LET  ME  TRY  ANOTHER  CASE  WHICH  USUALLY  POINTS  OUT  PROBLEMS  IN  THAT 
SECTION  OF  THE  CIRCUIT. 

UNDER  THE  SETTINGS: 

CC-.9,  CR=L0W,  LOAD: 1 0 , VC:1.0,  VR:LOW. 

THE  VOLTAGE  BETWEEN  N5  AND  N 1 3 WOULD  BE  8.94  VOLTS. 

IN  A WORKING  CIRCUIT  IT  IS  9.57  VOLTS. 

> >WH AT  IS  THE  OUTPUT  VOLTAGE  WHEN  THE  VOLTAGE  CONTROL  IS  SET  TO  .5? 

Wim  THE  VOLTAGE  I’UNTHUL  NOW  SET  AT  .5 — 

THE  OUTPUT  VOLTAGE  IS  15.6  VOLTS. 

A hypothetical  question  can  Involve  the  changing  of  a control.  If 
this  happens  the  control  Is  left  at  the  new  setting  and  the  student  Is 

told  so. 

>>IF  IT  IS  SET  TO  .7? 

WITH  THE  V0L7ACE- CONTROL  NOW  SET  AT  .7 
THE  OUTPUT  VOLTAGE  IS  21.4  VOLTS. 

Context  determines  which  measurement  should  be  taken  and  which  control 
should  be  set. 


WITH  THE  VOLTAGE  CONTROL  NOW  SET  AT  .9 
THE  OUTPUT  VOLTAGE  IS  27.2  VOLTS. 

>>WHAT  IS  THE  CURRENT  THRU  THE  CC  WHEN  THE  VC  IS  SET  TO  1.0? 

*:77TTHE  VHLTAHE  CONTROL  NOW  SET  A'''  l'.'U 

THE  CURRENT  THRU  THE  CURRENT  CONTROL  IS  30  MILLIAMPS . 


CC  and  VC  are  abbreviations.  Notice  that  controls  are  also  "parts" 
that  can  be  used  In  measurements. 

>>WH AT  IS  IT  WITH  IT  SET  TO  .8? 

elTTTTHK  VOITAHE  TOTTTROL  NOW  SET  AT  .8 

THE  CURRENT  THRU  THE  CURRENT  CONTROL  IS  24  MILLIAMPS. 


Here  the  system  must  correctly  Instantiate  two  occurrences  of  "It". 
The  referent  to  the  second  "It"  could  be  either  CC  or  VC  since  these 
are  both  controls. 


>> GOODBYE 

GOODBYE 
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Chapter  4 
SEMANTIC  GRAMMAR 


I 

I 


In  Chapter  1 we  described  the  requirements  for  a natural  language 
processor  in  a learning  environment.  Briefly,  they  are  efficiency  and 
friendliness  over  the  class  of  sentences  that  arise  in  a dialogue 
situation.  The  major  leverage  points  we  have  that  allow  us  to  satisfy 
these  requirements  are  (1)  limited  domain,  (2)  limited  activities  within 
that  domain,  and  (3)  known  conceptualizations  of  the  domain.  In  other 
words,  we  know  the  problem  area,  the  type  of  problem  the  student  is  tryinv 
to  solve,  and  the  way  he  should  be  thinking  about  the  problem  in  order  to 
solve  it.  What  we  are  then  faced  with  is  taking  advantage  of  these 
constraints  in  order  to  provide  an  effective  communication  channel. 

Notice  that  all  of  these  constraints  relate  to  concepts  underlying  the 
student's  activities.  In  SOPHIE,  the  concepts  include  voltage,  current, 
parts,  transistors,  terminals,  faults,  particular  parts  (e.g.  R9 , 05, 
etc.),  hypotheses,  controls,  settings  of  controls,  and  so  on.  The 
(dependency)  relationships  between  concepts  include  things  such  as: 
voltage  can  be  measured  at  terminals,  parts  can  be  faulted,  controls  can  be 
set,  etc.  The  student,  in  formulating  a query  or  statement,  is  requesting 
information  or  stating  a belief  about  one  of  these  relationships  (e.g. 
"What  is  the  voltage  at  the  collector  of  Q5"  or  "I  think  R9  is  open".)  It 
occurred  to  us  that  the  best  way  to  characterize  the  statements  used  for 
this  task  was  in  terms  of  the  concepts  themselves  as  opposed  to  the 
traditional  syntactic  structures.  The  language  can  be  described  by  a set 
of  grammar  rules  that  characterize,  for  each  concept  or  relationship,  all 
of  the  ways  of  expressing  it  in  terms  of  other  constituent  concepts.  For 
example,  the  concept  of  a measurement  requires  a quantity  to  be  measured 
and  something  against  which  to  measure  it.  A measurement  is  typically 
expressed  by  giving  the  quantity  followed  by  a preposition,  followed  by  the 
thing  that  specifies  where  to  measure  (e.g.  "voltage  across  C2",  "current 
thru  D1",  "power  dissipation  of  R9",  etc.)  These  phrasings  are  captured  in 
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the  grammar  rule:(11) 

< MEASUREMENT)  :=  < MEASURE ABLE/ QUANTITY)  <PREP>  <PART> 

The  concept  of  a measurement  can,  in  turn,  be  used  as  part  of  other 
concepts,  e.g.  to  request  a measurement  "What  is  the  voltage  across  C2?"; 
or  to  check  a measurement  "Is  the  current  thru  D1  correct?".  We  call  this 
type  of  grammar  a "semantic  grammar"  because  the  relationships  it  tries  to 
characterize  are  semantic/conceptual  as  well  as  syntactic. 

Semantic  grammars  have  two  advantages  over  traditional  syntactic 
grammars.  They  allow  semantic  constraints  to  be  used  to  make  predictions 
during  the  parsing  process,  and  they  provide  a useful  characterization  of 
those  sentences  that  the  system  should  try  to  handle.  The  predictive 
aspect  is  important  for  four  reasons:  (1)  It  reduces  the  number  of 
alternatives  that  must  be  checked  at  a given  time;  (2)  it  reduces  the 
amount  of  syntactic  (grammatical)  ambiguity;  (3)  it  allows  recognition  of 
ellipsed  or  deleted  phrases;  and  (4)  it  permits  the  parser  to  skip  words  at 
controlled  places  in  the  input  (i.e.  it  enables  a reasonable  specification 
of  control).  These  points  will  be  discussed  in  detail  in  a later  section. 

The  characterization  aspect  is  important  for  two  reasons:  (1)  It 
provides  a handle  on  the  problem  of  constructing  a habitable  sub-language. 
The  system  knows  how  to  deal  with  a particular  set  of  tasks  over  a 
particular  set  of  objects.  The  sub-language  can  be  partitioned  by  tasks  to 
accept  all  straightforward  ways  of  expressing  those  tasks,  but  does  not 
need  to  worry  about  others;  (2)  It  allows  a reduction  in  the  number  of 
sentences  that  must  be  accepted  by  the  language  while  still  maintaining 
habitability.  There  may  be  syntactic  constructs  that  are  used  frequently 
with  one  concept  (task)  but  seldom  with  another.  For  example,  relative 
clauses  may  be  useful  in  explaining  the  reasons  for  performing  an 
experimental  test  but  are  an  awkward  (though  possible)  way  of  requesting  a 
measurement.  By  separating  the  processing  along  semantic  grounds,  one  may 
gain  efficiency  by  not  having  to  accept  the  awkward  phrasing. 


Ml)  This  Is  not  actually  a rule  from  the  grammar  but  is  merely  intended  to 

be  suggestive. 
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Representation  of  Meaning 


Since  natural  language  communication  is  the  transmission  of  concepts 
via  phrases,  the  "meaning"  of  a phrase  is  its  correspondent  in  the 
conceptual  space.  The  entities  in  SOPHIE'S  conceptual  space  are  objects, 
relationships  between  objects,  and  procedures  for  dealing  with  objects. 
The  meaning  of  a phrase  can  be  a simple  data  object  (e.g.  "current  limiting 
transistor")  or  a complex  data  object  (e.g.  "C5  open",  "Voltage  at  node 
1").  The  meaning  of  a question  is  a call  to  a procedural  specialist  that 
knows  how  to  determine  the  answer.  The  meaning  of  a command  is  a call  to  a 
procedure  that  performs  the  specified  action. (12)  For  example,  the 
procedural  specialist  DOFAULT  knows  how  to  fault  the  circuit  and  is  used  to 
represent  the  meaning  of  commands  to  fault  the  circuit  (e.g.  "Open  R9" , 
"Suppose  C2  shorts  and  R9  opens").  The  argument  that  DOFAULT  needs  in 
order  to  perform  its  task  is  an  instance  of  the  concept  of  faults  that 
specifies  the  particular  changes  to  be  made,  e.g.  "R9  being  open".  These 
same  concepts  of  particular  faults  also  serve  as  arguments  to  two  other 
specialists:  HYPTEST  which  determines  the  consistency  of  a fault  with 
respect  to  the  present  context,  e.g.  "Could  R9  be  open";  and  SEEFAULT 
which  checks  the  actual  status  of  the  circuit,  e.g.  "Is  R9  open?". 

Result  of  the  Parsing 

Easing  the  grammar  on  conceptual  entities  allows  the  semantic 
interpretation  (the  determination  of  the  concept  underlying  a phrase)  to 
proceed  in  parallel  with  the  parsing.  Since  each  of  the  non-terminal 
categories  in  the  grammar  is  based  on  a semantic  unit,  each  grammar  rule 
can  specify  the  semantic  description  of  a phrase  that  it  recognizes  in  much 
the  same  way  that  a syntactic  grammar  specifies  a syntactic  description. 
The  construction  portion  of  the  rules  is  procedural.  Each  rule  has  the 
freedom  to  decide  how  the  semantic  descriptions,  returned  by  the 
constituent  items  of  that  rule,  are  to  be  put  together  to  form  the  correct 
"meaning" . 


(12)  Declarative  statements  are  treated  as  requests  because  the  pragmatics 
of  the  situation  imply  that  the  student  is  asking  for  verification  of  his 
statement.  For  example,  "I  think  C2  is  shorted"  is  taken  to  be  a request 
to  have  the  hypothesis  "C 2 is  shorted"  critiqued. 
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For  example,  the  meaning  of  the  phrase  "Q5"  is  the  data  base  object 
05-  The  meaning  of  the  phrase  "the  collector  of  Q5"  is  (COLLECTOR  Q5) 
where  COLLECTOR  is  a function  that  returns  the  data  base  item  that  is  the 
collector  of  the  given  transistor.  For  a more  complicated  example, 
consider  the  non-terminal  <MEASUREMENT>  shown  in  Figure  4.1. 

Figure  4.1 

A Semantic  Grammar  Rule(13) 

< MEASUREMENT  :=  output  <MEAS/QUANT>  [of  <TRANSF0RMER> ] ! 

<TRANSF0RMER>  <MEAS/QUANT>  I 
<MEAS/QUANT>  between  <N0DE>  and  <N0DE>  I 
<MEAS/QUANT>  <PREP>  <PART>  ! 

<MF.AS/QUANT>  between  output  terminals  I 
<MEAS/QUANT>  <PREP>  <JUNCTI0N>  ! 

<MEAS/QUANT>  <PREP>  <N0DE>  I 
< JUNCTION/ TYPE>  <MEAS/QUANT> 
of  <TRANSISTOR/SPEC>  ! 

<TRANSISTOR/TERM/TYPE>  <MEAS/QUANT> 
of  <TRANSIST0R> 

The  goal  for  this  non-terminal  is  to  capture  all  of  the  ways  that  a student 
can  specify  a measurement  (voltage  across  D3,  output  current,  etc.).  To 
specify  a measurement,  there  must  be  a quantity  to  be  measured  <MEAS/QUANT> 
(e.g.  voltage,  current,  resistance,  power  dissipation),  and  something  to 
measure  (e.g.  with  respect  to  a part,  <PART/SPEC>;  a transistor  Junction, 
<JUNCTI0N>;  or  possibly  a point  in  the  circuit,  <N0DE>).  The  rule  for 
<MEASUREMENT>  expresses  all  of  the  ways  that  the  student  can  give  a 
measurable  quantity  and  also  supply  its  required  arguments.  The  structure 
which  results  from  <MEASUREMENT>  is  a function  call  to  the  function  MEASURE 
which  supplies  the  quantity  being  measured  and  other  arguments  specifying 
where  to  measure  it.  Thus  the  meaning  of  the  phrase  "the  voltage  at  the 
collector  of  05"  is  (MEASURE  VOLTAGE  (COLLECTOR  Q5 ) ) which  was  generated 
from  the  control  structure: 


( 13)  The  rule  Is  expressed  in  a BNF-like  notation  which  is  an  abstraction 
of  the  actual  rule  (see  next  section).  Nor-terminals  are  in  capital 
letters  and  enclosed  in  angle  brackets.  Term*  jl  are  in  lower  case. 
Brackets  enclose  optional  elements.  Alterna.  /e  right  hand  sides  are 
separated  by  a "I". 
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measurement 

/ \ 

meas/quant  node 


voltage 


terminal 


collector  Q5 


A careful  examination  of  Figure  4.1  reveals  that  <MEASUREMENT>  also 
accepts  "meaningless"  phrases  such  as  "the  power  dissipation  of  Node  4." 
In  addition,  it  accepts  some  meaningful  phrases  such  as  "the  resistance 
between  Node  3 and  Node  14"  which  SOPHIE  does  not  calculate.  This  results 
from  generalizing  together  concepts  which  are  not  treated  identically  in 
the  surface  structure.  In  this  case,  voltage,  current,  resistance  and 
power  dissipation  were  generalized  to  the  concept  of  a measurable  quantity. 
Allowing  the  grammar  to  accept  more  statements  and  having  the 
argument-checking  done  by  the  procedural  specialists  has  the  advantage  of 
allowing  the  semantic  routines  to  provide  the  feedback  as  to  why  a sentence 
cannot  be  interpreted  or  "understood".  It  also  keeps  the  grammar  from 
being  cluttered  with  special  rules  for  blocking  meaningless  phrases. 
Carried  to  the  limit,  the  generalization  strategy  would  return  the  grammar 
to  being  "syntactic"  again  (e.g.  all  data  objects  are  "noun  phrases").  The 
trick  is  to  leave  semantics  in  the  grammar  when  it  is  beneficial  — to  stop 
extraneous  parsings  early,  or  tighten  the  range  of  a referent  for  an 
ellipsis  or  deletion.  This  is  obviously  a task-specific  trade-of f . ( 1 4 ) 


( 1 4 ) Eobrow  and  Brown  (1975)  describe  an  interesting  paradigm  from  which  to 
consider  this  trade-off. 


t 


The  relationship  between  a phrase  and  its  meaning  is  usually 
straightforward.  However,  it  is  not  limited  to  simple  embedding.  Consider 
the  phrases  "the  base  emitter  of  Q5  shorted"  and  "the  base  of  Q5  shorted  to 
the  emitter".  The  thing  which  is  "shorted"  in  both  of  these  phrases  is  the 
"base  emitter  junction  of  Q5-"  The  rule  that  recognizes  both  of  these 
phrases,  <PART/FAULT/SPEC> , can  handle  the  first  phrase  by  invoking  its 
constituent  concepts  of  <JUNCTION>  (base  emitter  of  Q5 ) and  <FAULT/TYPE> 
(shorted)  and  combine  their  results.  In  the  second  phrase,  however,  it 
must  construct  the  proper  junction  from  the  separate  occurrences  of  the  two 
terminals  involved.  Figure  4.2  gives  the  rules  used  to  recognize  these  two 
situations.  The  situations  are  distinguished  by  the  occurrence  of  the 
optional  constituent  in  the  second  phrase.  (As  will  be  discussed  later, 
the  rules  are  procedurally  encoded,  which  provides  a natural  way  of 
building  separate  semantic  forms  for  the  two  cases.)  Notice  that  the 
parser  does  some  paraphrasing,  as  the  "meaning"  of  the  two  phrases  is  the 
same  . 

Figure  4.2 
Grammar  Rules 

<P  ART/FAULT/SPEO  :=  < FAULT ABLE/ THING>  is  <FAULT/TYPE> 

[to  <TRANSISTOR/TERMINAL/TYPE> ] 

<FAULTABLE/THING>  :=  <JUNCTION>  ! <TERMINAL>  ! <PART> 

<FAULT/TYPE>  :=  open  ! shorted 

<TRANSISTOR/TERMINAL/TYPE>  :=  base  ! emitter  ! collector 

This  discussion  has  been  presented  as  if  the  concepts  were  defined  a 
priori  by  the  capabilities  of  the  system.  Actually,  for  the  system  to 
remain  at  all  habitable,  the  concepts  are  discovered  in  the  interplay 
between  the  statements  that  are  made  in  the  domain  and  the  capabilities  of 
the  system.  When  a particular  English  construct  is  difficult  to  handle,  it 
is  probably  an  indication  that  the  concept  it  is  trying  to  express  has  not 
been  recognized  properly  by  the  system.  In  our  example  "the  base  of  Q5  is 
shorted  to  the  emitter",  the  relationship  between  the  phrase  and  its 
meaning  is  awkward  because  the  present  concept  of  shorting  requires  a part 
or  a junction.  The  example  is  getting  at  a concept  of  shorting,  in  which 
any  two  terminals  can  be  shorted  together  (e.g.  "the  positive  terminal  of 
R9  is  shorted  to  the  anode  of  D6" ) . This  is  a viable  conceptual  view  of 
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"shorting",  but  its  implementation  requires  allowing  arbitrary  changes  in 
the  topology  of  the  circuit  which  is  beyond  the  efficiency  limitations  of 
SOPHIE'S  simulator.  Thus,  the  system  we  were  working  with  led  us  to  define 
the  concept  in  too  limited  a way. 

I'afc-ur  mmu  IKEOBMAIIW.-EUftllili  PABSIMfi 

Prediction 

Having  described  the  notion  of  a semantic  grammar,  we  will  now 
describe  the  ways  it  allows  semantic  information  to  be  used  in  the 
understanding  process.  One  use  of  semantic  grammars  is  to  predict  the 
possible  alternatives  that  must  be  checked  at  a given  point.  Consider,  for 
example,  the  phrase  "the  voltage  at  xxx".  After  the  word  "at"  is  reached 
in  the  top-down,  left-to-right  parse,  the  grammar  rule  corresponding  to  the 
concept  "measurement"  can  predict  very  specifically  the  conceptual  nature 
of  "xxx":  it  must  be  a phrase  that  directly  or  indirectly  specifies  a 

location  in  the  circuit.  For  example,  "xxx"  could  be  "the  junctions  of  the 
current  limiting  section  and  the  voltage  reference  source"  but  cannot  be  "3 
ohms" . 

Semantic  grammars  also  have  the  effect  of  reducing  the  amount  of 

grammatical  ambiguity.  In  the  phrase  "the  voltage  at  xxx",  the 

prepositional  phrase  "at  xxx"  will  be  associated  with  the  noun  "voltage" 
without  considering  any  alternative  parses  that  associates  it  someplace 
higher  in  the  tree. 

Predictive  information  is  also  used  to  aid  in  the  determination  of 
referents  for  pronouns.  If  the  above  phrase  were  "the  voltage  at  it",  the 
grammar  would  be  able  to  restrict  the  class  of  possible  referents  to 
locations.  By  taking  advantage  of  the  available  sentence  contexts  to 

predict  the  semantic  class  of  possible  referents,  the  referent 
determination  process  is  greatly  simplified.  For  example: 

(la)  Set  the  voltage  control  to  .8? 

(lb)  What  is  the  current  thru  H9V 

(lc)  What  is  it  with  it  set  to  .9? 

In  (1c),  the  grammar  is  able  to  recognize  that  the  first  "it"  refers  to  a 
measurement  that  the  student  would  like  re-taken  under  slightly  different 
conditions.  The  grammar  can  also  decide  that  the  second  "it"  refers  to 
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either  a potentiometer  or  to  the  load  resistance  (i.e.  one  of  those  things 
which  can  be  set).  The  referent  for  the  first  "it"  is  the  measurement 
taken  in  (1b),  "the  current  thru  R9".  The  referent  for  the  second  "it"  is 
"the  voltage  control"  which  is  an  instance  of  a potentiometer.  The  context 
mechanism  that  selects  the  referents  will  be  discussed  later. 

Simple  Deletion 

The  semantic  grammar  is  also  used  to  recognize  simple  deletions.  The 
grammar  rule  for  each  conceptual  entity  knows  the  nature  of  that  entity's 
constituent  concepts.  When  a rule  cannot  find  a constituent  concept,  it 
can  either: 

a)  fail  (if  the  missing  concept  is  considered  to  be  obligatory  in  the 
surface  structure  representation)  or, 

b)  hypothesize  that  a deletion  has  occurred  and  continue. 

For  example,  the  concept  of  a TERMINAL  has  as  one  of  its  realizations  the 
constituent  concepts  of  a TERMINAL-TYPE  and  a PART.  When  its  grammar  rule 
finds  only  the  phrase  "the  collector",  it  uses  this  information  to  posit 
that  a part  has  been  deleted  (i.e.  TERMINAL-TYPE  gets  instantiated  to  "the 
collector"  but  nothing  gets  instantiated  to  PART) . The  natural  language 
processor  then  uses  the  dependencies  between  the  constituent  concepts  to 
determine  that  the  deleted  PART  must  be  a TRANSISTOR.  The  "meaning"  of 
this  phrase  is  then  "the  collector  of  some  transistor".  Which  transistor 
is  determined  when  the  meaning  is  evaluated  in  the  present  dialogue 
context.  In  particular,  the  semantic  form  returned  is  the  function  PREF 
and  the  classes  of  possible  referents;  in  our  example  the  form  would  be 
(COLLECTOR  (PREF  '( TRANSISTOR ))).( 15 ) The  operation  of  PREF  will  be 
discussed  later. 

Ellipsis 

Another  use  of  the  semantic  grammar  allows  the  processor  to  recognize 
elliptic  utterances.  These  are  utterances  that  do  not  express  complete 
thoughts  — a completely  specified  question  or  command  — but  only  give 

(15)  ?he  language  LISP  will  be  used  In  examples  throughout  this  thesis.  Tn 
LISP,  a function  call  is  expressed  in  Cambridge-Polish  notation:  as  a 
parenthesized  list  of  the  function  name  followed  by  its  arguments. 
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differences  between  the  intended  thought  and  an  earlier  one.  (16)  For 
example,  2b,  2c  and  2d  are  elliptic  utterances. 

(2a)  What  is  the  voltage  at  Node  5? 

(2b)  At  Node  1? 

(2c)  and  Node  2? 

(2d)  What  about  between  nodes  7 and  8? 

Ellipses  can  begin  with  introductory  phrases  such  as  "and"  in  2c  or  "what 
about"  in  2d;  however  this  is  not  required  as  can  be  seen  in  2b.  Part  of 
the  ellipsis  rule  is  given  in  Figure  4-3. 

Figure  4.3 
Ellipsis  Rule 

< ELLIPSIS>  :=  f <ELLIPSIS/INTR0DUCER>1  <HEOUEST/PIECE>  ! 

[ <ELLIPSIS/INTRODUCER> J if  <PART/FAULT/SPEC> 

< REQUEST/ P I ECE>  :=  [<PREP>]  <NODE>  ! 

[ <PREP>  J <PART>  ! 
between  <NODE>  and  <NODE>  ! 

[ <PREP>  ] < JUNCTION?  ! 
etc . 

The  grammar  rule  identifies  which  concept  or  class  of  concepts  are  possible 
from  the  context  available  in  the  elliptic  utterance. 

While  the  parser  is  usually  able  to  determine  the  intended  concepts 
from  the  context  available  in  an  elliptic  utterance,  this  is  not  always  the 
case.  Consider  the  following  two  sequences  of  statements. 

(3a)  What  is  the  voltage  at  Node  5? 

(3b)  10? 

(4a)  What  is  the  output  voltage  if  the  load  is  100? 

(4b)  10? 

In  (3b),  "10"  refers  to  node  10,  while  in  (4b)  it  refers  to  a load  of  10. 

The  problem  this  presents  to  the  parser  is  that  the  concepts  underlying 
these  two  elliptic  utterances  have  nothing  in  common  except  their  surface 
realizations.  The  parser,  which  operates  from  conceptual  entities,  does  not 
have  a concept  that  includes  both  of  these  interpretations.  One  solution 
would  be  to  have  the  parser  find  all  parses  (concepts)  and  then  choose 
between  them  on  the  basis  of  context.  Unfortunately,  this  would  mean  that 
time  is  wasted  looking  for  more  than  one  parse  for  the  large  percentage  of 
sentences  in  which  it  is  not  necessary  to  do  so . A better  solution  would 

(16)  The  standard  use  of  the  word  ''ellipsis"  refers  to  any  deletion . 
Rather  than  invent  a new  word,  we  shall  use  the  restricted  meaning  here. 


be  to  allow  structure  among  the  concepts,  so  that  the  parser  would 
recognize  "10"  as  a member  of  the  concept  "number".  Then  the  routines  that 
find  the  referent  would  know  that  numbers  can  be  either  node  numbers  or 
values.  This  type  of  recognition  could  profitably  be  performed  by  a 
bottom-up  approach  to  parsing.  However,  its  advantages  over  the  present 
scheme  are  not  enough  to  Justify  the  expense  incurred  by  a bottom-up  parse 
to  find  all  possible  well-formed  constituents.  At  present,  the  parser 
assumes  one  interpretation,  and  a message  is  printed  to  the  student 
indicating  the  assumed  interpretation.  If  it  is  wrong,  the  student  must 
supply  more  context  in  his  request.  In  fact,  "10?"  is  taken  as  a load 
specification  and  if  the  student  meant  the  node  he  would  have  to  use  "at 
10",  "N10"  or  "Node  10".  Later  we  will  discuss  the  mechanism  that 
determines  to  which  complete  thought  an  ellipsis  refers. 

USING  CONTEXT  TQ  DETERMINE  REFERENTS 
Pronouns  and  Deletions 

Once  the  parser  has  determined  the  existence  and  class  (or  set  of 
classes)  of  a pronoun  or  deleted  object,  the  context  mechanism  is  invoked 
to  determine  the  proper  referent.  This  mechanism  has  a history  of  student 
interactions  during  the  current  session  which  contains,  for  each 
interaction,  the  parse  (meaning)  of  the  student's  statement  and  the 
response  calculated  by  the  system.  This  list  provides  the  range  of 
possible  referents  and  is  searched  in  reverse  order  to  find  an  object  of 
the  proper  semantic  class  (or  one  of  the  proper  classes).  To  aid  in  the 
search,  the  context  mechanism  knows  how  each  of  the  procedural  specialists 
appearing  in  a parse  uses  its  arguments.  For  example,  the  specialist 
MEASURE  has  a first  argument  that  must  be  a quantity  and  a second  argument 
that  must  be  a part,  a junction,  a section,  a terminal  or  a node.  Thus 
when  the  context  mechanism  is  looking  for  a referent  that  can  either  be  a 
PART  or  a JUNCTION,  it  will  look  at  the  second  argument  of  a call  to 
MEASURE  but  not  the  first.  Using  the  information  about  the  specialists, 
the  context  mechanism  looks  in  the  present  parse  and  then  in  the  next  most 
recent  parse,  etc.  until  an  object  from  one  of  the  specified  classes  is 
found . 
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The  significance  of  using  the  specialist  to  filter  the  search  instead 
of  just  keeping  a list  of  previously  mentioned  objects  is  that  it  avoids 
mis-interpretations  due  to  object-concept  ambiguity.  As  an  example, 
consider  the  following  sequence  from  the  sample  dialogue  in  Chapter  3: 

(5)  What  is  the  current  thru  the  CC  when  the  VC  is  1.0? 

(6)  What  is  it  when  it  is  .8? 

Sentence  (5)  will  be  recognized  by  the  following  rules  from  the  semantic 
grammar : 

II)  <REQUEST>  :=  <3IMPLE/REQUEST>  when  <SETTING/CHANGE> 

2 <SIMPLE/REQUEST>  :=  what  is  < MEASUREMENT 
3)  < MEASUREMENT  :r  <MEAS/QUANT>  <PREP>  <PART> 
k)  <SETTING/CHANGE>  :z  <C0NTR0L>  is  <C0NTR0L/VALUE> 

5)  <C0NTR0L>  :=  VC 

with  a resulting  semantic  form  of: 

( RESETCONTROL  ( STQ  VC  1.0) 

(MEASURE  CURRENT  CC)) 

RESETCONTROL  is  a function  whose  first  argument  specifies  a change  to 
one  of  the  controls  and  whose  second  argument  consists  of  a form  to  be 
evaluated  in  the  resulting  instrument  context.  STQ  is  used  to  change  the 
setting  of  the  one  of  the  controls.  The  first  argument  to  MEASURE  gives  the 
quantity  to  be  measured.  The  second  specifies  where  it  is  to  be  measured. 
To  recognize  sentence  (6),  the  application  of  rules  $2  and  $5  are  changed. 
There  is  an  alternative  rule  for  <SIMPLE/REQUEST>  that  looks  for  those 
anaphora  that  refer  to  a measurement.  These  phrases,  such  as  "it",  "that 
result"  or  "the  value",  are  recognized  by  the  non-terminal 

<MEAS'JREMENT/PRONOUN>  . The  alternative  to  $2  that  would  be  used  to  parse 
(6)  is: 

< SIMPLE/ REQUEST  :=  what  is  < MR ASU REME NT/ PRONOUN > 

The  semantics  of  <MEASUREMENT/PRONOUN>  indicate  that  an  entire  measurement 
has  been  deleted.  The  alternative  to  rule  $5: 

<C0NTR0L>  :=  it 

recognizes  "it"  as  an  acceptable  way  to  specify  a control.  The  resulting 
semantic  form  for  sentence  (6)  is: 

(RESETCONTROL  (STQ  (PREF  ’(CONTROL))  .«) 

(PREF  '(MEASUREMENT))) 
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The  function  PREF  searches  back  through  the  context  of  previous  semantic 
forms  to  find  the  most  reoent  mention  of  a member  one  of  the  classes.  In 
the  above  example,  it  will  find  the  control  VC  but  not  CC  because  the 
character  imposed  on  the  arguments  of  MEASURE  is  that  of  a "part'1  not  a 
"control" .( 17 ) The  presently  recognized  classes  for  deletions  are  PART, 
TRANSISTOR,  FAULT,  CONTROL,  POT,  SWITCH,  DIODE,  MEASUREMENT  and  QUANTITY. 
(The  members  of  the  classes  are  derived  from  the  semantic  network 
associated  with  a circuit.) 

Referents  for  Ellipses 

If  the  problem  of  pronoun  resolution  is  looked  upon  as  finding  a 
previously  mentioned  object  for  a currently  specified  use,  then  the  problem 
of  ellipsis  can  be  thought  of  as  finding  a previously  mentioned  use  for  a 
currently  specified  object.  For  example: 

(7)  What  is  the  base  current  of  Q4? 

(6)  In  Q5? 

The  given  object  is  "Q5",  and  the  earlier  function  is  "base  current".  For 
a given  elliptic  phrase,  the  semantic  grammar  identifies  the  concept  (or 
class  of  concepts)  involved.  In  (7),  since  Q5  is  recognized  by  the 
non-terminal  <TRANSISTOR/SPEC> , the  class  would  be  TRANSISTOR.  The  context 
mechanism  then  searches  for  a specialist  in  a previous  parse  that  accepted 
the  given  class  as  an  argument.  When  one  is  found,  the  new  phrase  is 
placed  in  the  proper  argument  position  and  the  modified  parse  is  used  as 
the  meaning  of  the  ellipsis. 

Limitations  to  the  Context  Mechanism 

The  method  of  semantic  classification  (to  determine  reference)  is  very 
efficient  and  works  well  over  our  domain.  It  definitely  does  not  solve  all 
the  problems  of  reference.  Charniak  has  pointed  out  the  substantial 

(17)  The  character  imposition  as  described  is  too  strong,  for  example: 

*1)  What  are  the  specs  of  Q5? 

%2)  What  is  the  voltage  at  its  emitter? 

The  character  imposed  on  Q5  in  $1  is  that  of  a part  which  means  that  the 
context  mechanism  invoked  by  $2  which  is  looking  for  a transistor  won't 
find  it.  This  example  is  handled  by  relaxing  the  restrictions  the 
procedural  specialist  in  $1  puts  on  its  argument  (i.e.  it  can  be  either  a 
PART  or  a TRANSISTOR).  In  spite  of  this  weakness  in  the  argument 
limitation  approach,  we  have  found  it  to  be  a useful  means  of  reducing  the 
search  time  and  avoiding  some  obvious  mis-interpretations . 
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problems  of  reference  in  a domain  as  seemingly  simple  as  children's  stories 
(1972).  One  of  his  examples  demonstrates  how  much  world  knowledge  may  be 
required  to  determine  a referent  (1972  p.  7). 


Janet  and  Penny  went  to  the  store  to  get  presents  for  Jack.  Janet 
said  "I  will  get  Jack  a top"  "Don't  get  Jack  a top"  said  Penny.  "He 
has  a top.  He  will  make  you  take  it  back." 

Charniak  argues  that  to  understand  to  which  of  the  two  tops  "it" 
refers,  requires  knowing  about  presents,  stores  and  what  they  will  take 
back,  etc.  Even  in  domains  where  it  may  be  possibl  o capture  all  of  the 
necessary  knowledge,  classification  may  still  lead  to  ambiguities.  For 
exi.nple,  consider  the  following: 

(9)  What  is  the  voltage  at  Node  5 if  the  load  is  100? 

(10)  Node  6? 

(11)  7? 

In  statement  (11)  the  user  means  Node  7.  In  statement  (10),  he  has 
reinforced  the  use  of  ellipsis  as  referring  to  node  number.  (For  example, 
leaving  out  statment  (10),  sentence  (11)  is  much  more  awkward.)  On  the 
other  hand,  if  statement  (11)  had  been  "1000"  or  if  statement  (10)  had  been 
"10?",  things  would  be  more  problematic.  When  statement  (11)  is  "1000",  we 
can  infer  that  he  means  a load  of  1000  because  there  is  no  node  1000.  If 
statement  (10)  had  been  "10?",  there  would  be  genuine  ambiguity  slightly 
favoring  the  interpretation  as  a load  because  that  was  the  last  number 
mentioned.  The  major  limitation  of  the  current  technique,  which  must  be 
overcome  in  order  to  tackle  significantly  more  complicated  domains,  is  its 
inability  to  return  more  than  one  possible  referent.  It  considers  each  one 
individually  until  it  finds  one  which  is  satisfactory.  The  amount  of  work 
involved  in  employing  a technique  which  allows  comparing  referents  has  not 
been  justified  by  our  experience. 

ftfcUHQMSHlE  10  OTHER  SEMAMIiC  a 

The  relationship  between  semantic  grammars  and  purely  semantic 
systems  (Quillian  1969;  Schank  et  al.  1975)  and  to  some  extent  Wilks 
(1973a.  1973b)  parallels  the  distinction  between  procedural  and  declarative 
knowledge.  The  relationship  that  exists  between  nodes  in  the  semantic 
network  structure  contains  little  or  no  information  about  how  these 
relationships  might  be  expressed  in  language.  An  interpretation  mechanism 
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must  decide  where  the  information  is  useful.  While  this  is,  in  some  sense, 
more  general  (the  same  information  can  be  used  for  several  purposes  given 
the  proper  interpreter ) , it  is  necessarily  less  efficient.  (Wilks  has 
extracted  some  expressive  information,  primarily  concept  order,  into  his 
templates.)  A semantic  grammar,  on  the  other  hand,  is  written  for  the 
process  of  recognizing  concepts  as  they  are  expressed  in  the  surface 
structures. 

FUZZINESS 

Having  the  grammar  centered  around  semantic  categories  allows  the 
parser  to  be  sloppy  about  the  actual  words  it  finds  in  the  statement. 
Having  a concept  in  mind,  and  being  willing  to  ignore  words  to  find  it,  is 
the  essence  of  keyword  parsing  schemes.  It  is  effective  in  those  cases 
where  the  words  that  have  been  skipped  ^are  either  redundant,  or  specify 
gradations  of  an  idea  that  are  not  distinguished  by  the  system.  For 
example,  in  the  sentence:  "Insert  a very  hard  fault",  "very"  would  be 
ignored;  this  is  effective  because  the  system  does  not  have  any  further 
structure  over  the  class  of  hard  faults.  In  the  sentence:  "What  is  the 
voltage  across  resistor  R8?"  resistor  can  be  ignored  because  it  is  implied 
by  "R8"  . ( 18) 

One  advantage  that  a procedural  encoding  of  the  grammar  (discussed 
later)  has  over  pattern  matching  schemes  in  the  implementation  of  fuzziness 
is  its  ability  to  control  exactly  where  words  can  be  ignored.  This 
provides  the  ability  to  blend  pattern  matching  parsing  of  those  concepts 
that  are  amenable  to  it  with  the  structural  parsing  required  by  more 
complex  concepts.  The  amount  of  fuzziness  — how  many,  if  any,  words  in  a 
row  can  be  ignored  — is  controlled  in  two  ways.  First,  whenever  a grammar 
rule  is  invoked,  the  calling  rule  has  the  option  of  limiting  the  number  of 
words  that  can  be  skipped.  Second,  each  rule  can  decide  which  of  its 
constituent  pieces  or  words  are  required  ar.d  how  tightly  controlled  the 
search  for  them  should  be.  In  SOPHIE,  the  normal  mode  of  operation  of  the 
parser  is  tight  in  the  beginning  of  a sentence,  but  fuzzier  after  it  has 
made  sense  out  of  something. 

(18)  The  first  of  these  examples  could  be  handled  by  making  "very"  a noise 
word  (i.e.  deleting  it  from  all  sentences).  Resistor  however  is  not  a 
noise  word  in  all  cases  (e.g.  "What  is  the  current  through  the  current 
sensing  resistor?")  and  hence  cannot  be  deleted. 


- 33  - 


huzziness  has  two  other  advantages  worth  mentioning  briefly.  It 
reduces  the  size  of  the  dictionary  because  all  known  noise  words  don't  have 
to  be  included.  In  those  cases  where  the  skipped  words  are  meaningful,  the 
misunderstanding  may  provide  some  clues  to  the  user  which  allow  him  to 
restate  his  query. 

fc'hfcihvCESSINiJ 

Before  a statement  is  parsed,  a preprocessor  performs  three 
operations.  The  first  expands  abbreviations,  deletes  known  noise  words, 
and  canonicalizes  similar  words  to  a common  form.  The  second  is  a cursory 
spelling  correction.  The  third  is  a reduction  of  compound  words. 

Spelling  correction  is  attempted  on  any  word  of  the  input  string  that 
the  system  does  not  recognize.  The  spelling  correction  algorithm( 19 ) takes 
the  possibly  misspelled  word,  and  a list  of  correctly  spelled  words,  and 
determines  which,  if  any,  of  the  correct  words  is  close  to  the  misspelled 
word  (using  a metric  determined  by  number  of  transpositions , doubled 
letters,  dropped  letters,  etc.).  During  the  initial  preprocessing,  the 
list  of  correct  words  is  very  small  (approximately  a dozen)  and  is  limited 
to  very  commonly  misspelled  words  and/or  words  that  arc  critical  to  the 
understanding  of  a sentence.  The  list  is  kept  small  so  that  the  time  spent 
attempting  spelling  correction,  prior  co  attempting  a parse,  is  kept  to  a 
minimum.  Remember  that  the  parser  has  the  ability  to  ignore  words  in  the 
input  string  so  we  do  not  want  to  spend  a lot  of  time  correcting  a word 
that  won't  be  needed  in  understanding  the  statement.  But  notice  that 
certain  words  can  be  critical  to  the  correct  understanding  of  a statement. 
For  example,  suppose  that  the  phrase  "the  base  emitter  current  of  Q3"  was 
incorrectly  typed  as  "the  bse  emitter  current  of  J3" • If  "bse"  were  not 
recognized  as  being  "base"  the  parser  would  ignore  it  and  (mis-)understand 
the  phrase  as  "the  emitter  current  of  Q3",  a perfectly  acceptable  but  much 
different  concept.  (20)  Because  of  this  problem,  words  like  "base",  which 
if  ignored  have  been  found  to  lead  to  misunderstandings,  are  considered 
critical  and  their  spelling  is  corrected  before  any  parse  is  attempted. 

(19)  The  spelling  correction  routines  are  provided  by  INTERLISP  and  were 
developed  by  Teltelman  for  use  in  the  DWIM  facility  (Teitelman  1969,1974). 

(20)  To  minimize  the  consequences  of  such  misinterpretation , the  system 
always  responds  with  an  answer  that  indicates  what  question  it  is 
answering,  rather  than  just  giving  the  numeric  answer. 
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Note  that  there  are  a lot  of  words  — "capacitor",  "replace",  "open",  for 
I example  --  that  if  misspelled  would  prevent  the  parser  from  making  sense  of 

the  statement,  but  would  not  lead  to  any  mis-understandings . These  words 
therefore  are  not  considered  to  be  critical,  and  would  be  corrected  in  the 
second  attempt  at  spelling  correction  that  is  done  after  a statement  fails 
to  parse. 

Compound  words  are  single  concepts  that  appear  in  the  surface 
structure  as  a fixed  series  of  more  than  one  word.  Their  reduction  is  very 
important  to  the  efficient  operation  of  the  parser.  For  example,  in  the 
question  "what  is  the  voltage  range  switch  setting?",  "voltage  range 
switch"  is  rewritten  as  the  single  item  "VR".  If  not  rewritten,  "voltage" 
would  be  mistaken  as  the  beginning  of  a measurement  (as  in  "what  is  the 
voltage  at  N4")  and  an  attempt  would  have  to  be  made  to  parse  "range  switch 
setting"  as  a place  to  measure  voltage.  Of  course  after  this  failed,  the 
correct  parse  can  still  be  found,  but  reducing  compound  words  helps  to 
avoid  backtracking.  In  addition,  the  reduction  of  compound  words 
simplifies  the  grammar  rules  by  allowing  them  to  work  with  larger 
r conceptual  units.  In  this  sense,  the  preprocessing  can  be  viewed  as  a 

preliminary  bottom-up  parse  that  recognizes  local,  multi-word  concepts. 

IMPLEMENTATION 

Once  the  dependencies  between  semantic  concepts  have  been  expressed  in 
the  BNF  form,  each  rule  in  the  grammar  is  encoded  (by  hand)  as  a LISP 
procedure.  This  encoding  process  imparts  to  tne  grammar  a top-down  control 
structure,  specifies  the  order  of  application  of  the  various  alternatives 
of  each  rule,  and  defines  the  process  of  pattern  matching  each  rule.  The 
resulting  collection  of  LISP  functions  constitutes  a goal-oriented  parser 
in  a fashion  similar  to  SHRDLU  (Winograd  1973),  but  without  the 
backtracking  ability  of  PROGRAMMAR. 

As  has  been  argued  elsewhere  (Woods  1970;  Winograd  1973),  encoding  the 
grammars  as  procedures  — including  the  notion  of  process  in  the  grammar  — 
has  advantages  over  using  traditional  phrase  structure  grammar 
representations.  Four  of  these  advantages  are: 

1)  the  ability  to  collapse  common  parts  of  a grammar  rule  while  still 
maintaining  the  perspicuity  of  the  grammar. 
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2)  the  ability  to  collapse  similar  rules  by  passing  arguments  (as  with 
SENDR)  . 

3)  the  ease  of  interfacing  other  types  of  knowledge  (in  SOPHIE,  primarily 
the  semantic  network)  into  the  parsing  process. 

4)  the  ability  to  build  and  save  arbitrary  structures  during  the  parsing 
process . 

(21) 

In  addition  to  the  advantages  it  shares  with  other  procedural 
representations,  the  LISP  encoding  has  the  computational  advantage  of  being 
compilable  directly  into  efficient  machine  code.  The  LISP  implementation 
is  efficient  because  the  notion  of  process  it  contains  (one  process  doing 
recursive  descent)  is  close  to  that  supported  by  physical  machines,  while 
those  of  ATN  and  PROGRAMMAR  are  non-deterministic  and  hence  not  directly 
translatable  into  present  architecture.  See  (Burton  1976)  for  a 

description  of  how  it  is  possible  to  minimize  this  mismatch.)  Appendix  B 
describes  the  details  of  the  LISP  implementation  and  provides  an  example  of 
a rule  from  the  grammar. 

In  terms  of  efficiency,  the  LISP  implementation  of  the  semantic 
grammar  succeeds  admirably.  The  grammar  written  in  INTERLISP  (Teitelman 
1974)  can  be  block  compiled.  Using  this  technique,  the  complete  parser 
takes  about  5K  of  storage  and  parses  a typical  student  statement  consisting 
of  8 to  12  words  in  around  150  milliseconds!  Appendix  C presents  parses 
and  timings  of  some  of  the  sentences  used  in  the  dialogue. 


(21)  This  ability  is  sometimes  provided  by  allowing  augments  on  phrase 
structure  rules. 
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A NEW  FORMALISM  --  SEMANTIC  AUGMENTED  TRANSITION  NETWORKS 

Using  the  techniques  described  in  Chapter  4,  a natural  language 
front-end,  capable  of  supporting  the  dialogue  presented  in  Chapter  3,  and 
requiring  less  than  200  milliseconds  cpu  time  per  question,  was 
constructed.  In  addition,  these  same  techniques  were  used  to  build  a 
front-end  for  NLS-SCHOLAR  (Grignetti  et  al . 1974;  Grignetti  et  al . 1975) 
(built  by  C.  Hausmann),  and  an  interface  to  an  experimental  laboratory  for 
exploring  mathematics  using  attribute  blocks  (Brown  et  al . 1976).  In  the 
construction  of  these  varying  systems,  the  notion  of  semantic  grammar 
proved  to  be  useful.  The  LISP  implementation,  however,  was  found  to  be  a 
bit  unwieldy.  While  expressing  the  grammar  as  programs  has  benefits  in  the 
area  of  efficiency  and  allows  complete  freedom  to  explore  new  extensions, 
the  technique  is  lacking  in  perspicuity.  This  lack  of  perspicuity  has 
three  major  drawbacks:  (1)  the  difficulty  encountered  when  trying  to 
modify  or  extend  the  grammar;  (2)  the  problem  of  trying  to  communicate  the 
extent  of  the  grammar  to  either  a user  or  a colleague;  (3)  the  problem  of 
trying  to  re-implement  the  grammar  on  a machine  that  does  not  support  LISP. 
These  difficulties  have  been  partially  overcome  by  using  a second,  parallel 
representation  of  the  grammar  in  a BNF-like  specification  language  which  is 
the  representation  we  have  been  presenting  throughout  this  report.  This, 
however,  requires  supporting  two  different  representations  of  the  same 
information  and  does  not  really  solve  problems  (1)  or  (3).  The  solution 
to  this  problem  is  a better  formalism  for  expressing  and  thinking  about 
semantic  grammars. 

Augmented  Transition  Networks  ( ATN ) 

Some  years  ago,  Chomsky  (1957)  introduced  the  notion  that  the 
pr  ocesses  of  language  generation  and  language  recognition  could  be  viewed 
in  terms  of  a machine.  One  of  the  simplest  of  such  models  is  t e finite 
state  machine.  It  starts  off  in  its  initial  state  looking  at  the  first 
symbol,  or  word,  of  its  input  sentence  and  then  moves  from  state  to  state 
as  it  gobbles  up  the  remaining  input  symtols.  The  sentence  is  accepted  if 
the  machine  stops  in  one  of  its  final  states  after  having  processed  the 
entire  input  string;  otherwise  the  sentence  is  re lected . A convenient  way 


37  - 


of  representing  a finite  state  machine  is  as  a transition  graph,  in  which 
the  states  correspond  to  the  nodes  of  the  graph  and  the  transitions  between 
states  correspond  to  its  arcs.  Each  arc  is  labelled  with  a symbol  whose 
appearance  in  the  input  can  cause  the  given  transition. 

in  an  augmented  transition  network,  the  notion  of  a transition  graph 
has  ueen  modified  in  three  ways:  (1)  the  addition  of  a recursion  mechanism 

that  allows  the  labels  on  the  arcs  to  be  non-terminal  symbols  that 
correspond  to  networks;  (2)  the  addition  of  arbitrary  conditions  on  the 
arcs  that  must  be  satisfied  in  order  for  an  arc  to  be  followed;  (3)  the 
inclusion  of  a set  of  structure  building  actions  on  the  arcs,  together  with 
a set  of  registers  for  holding  partially  built  structures .( 22 ) Figure  5-1 
is  a specification  of  a language  for  representing  augmented  transition 
networks.  The  specification  is  given  in  the  form  of  an  extended, 
context-free  grammar  in  which  alternative  ways  of  forming  a constituent  are 
represented  on  separate  lines  and  the  symbol  "+"  is  used  to  indicate 
arbitrarily  repeatable  constituents .( 23 ) The  non-terminal  symbols  are 
lower  case  English  descriptions  enclosed  in  angle  brackets.  All  other 
symbols,  except  "+" , are  terminals.  Non-terminals  not  given  in  Figure  5-1 
have  names  that  should  be  self-explanatory. 

F igure  5 . 1 

A Language  for  representing  ATNs 

(transition  network?  :=  ( (arc  set>  (arc  set>+) 

(arc  set>  :=  ( (state>  (arc?+) 

(arc>  :=  (CAT  (category  name>  (test>  (action?*  (term  act>) 

iwRD  (word>  (test>  (action>+  (term  act>) 

PUSH  (state>  (test>  (action>+  (term  act?) 

TST  (arbitrary  label>  (test>  (action?*  (term  act>) 

POP  (form>  (test>) 

VIh  (constituent  name>  (test>  (action?*  (term  act>) 

JUMP  (state?  (test>  (action?*) 

(action>  ::  (SETR  (register>  (form?) 

(SENDR  (registers  (form>) 

(LIFTR  (register?  (form?) 

(HOLD  (constituent  name?  (form?) 

(SETF  (feature?  (form?) 

(term  act?  :=  i TO  (state?) 


(2"J  This  discussion  follows  closely  "a  similar  discussion  in  woods  ( 1970) 
to  which  the  reader  is  referreJ.  If  the  reader  is  familiar  with  the  ATN 
formalism  he/she  may  wish  to  skip  to  the  section  "Advantages  to  the  ATN 
Formalism" . 

(23)  is  used  to  mean  0 or  more  occurrences.  While  the  accepted  usage 
of  "*"  is  1 or  more,  the  accepted  symbol  for  0 or  more.  has  not  been 
used  to  avoid  confusion  with  the  use  of  the  symbol  • in  the  ATN  formalism. 
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<form>  :=  (GETR  <register>) 

LEX 

• 

(GETF  <form>  <feature>) 

(BUILDQ  <fragment>  <register>+) 

(LIST  <form>+) 

(APPEND  <form>  <form>) 

(QUOTE  <arbitrary  structure)*) 

The  first  element  of  each  arc  is  a word  indicating  the  type  of  arc. 
For  CAT,  WRD  and  PUSH  arcs,  the  arc  type  together  with  the  second  element 

correspond  to  the  label  on  an  arc  of  a state  transition  graph.  The  third 

element  is  an  additional  test.  A CAT  arc  can  be  followed,  if  the  current 
input  symbol  is  a member  of  the  lexical  category  named  on  the  arc,  and  if 
the  test  on  the  arc  is  satisfied.  A PUSH  arc  causes  a recursive  invocation 
of  a lower  level  network  beginning  at  the  state  indicated,  if  the  test  is 

satisfied.  The  WRD  arc  can  be  followed  if  the  current  input  symbol  is  the 

word  named  on  the  arc  and  if  the  test  is  satisfied.  The  TST  arc  can  be 

followed  if  the  test  is  satisfied  (the  label  is  ignored).  The  VIR  arc 

(virtual  arc)  can  be  followed  if  a constituent  of  the  named  type  has  been 
placed  on  the  hold  list  by  a previous  HOLD  action  and  the  constituent 
satisfies  the  test.  In  all  of  these  arcs,  the  actions  are  structure 
building  actions,  and  the  terminal  action  specifies  the  state  to  which 
control  is  passed  as  a result  of  the  transition.  After  CAT,  WRD  and  TST 

arcs,  the  input  is  advanced;  after  VIR  and  PUSH  arcs  it  is  not.  The  JUMP 

arc  can  be  followed  whenever  its  test  is  satisfied,  control  being  passed  to 
the  state  specified  in  the  second  element  of  the  arc  without  advancing  the 
input.  The  POP  arc  indicates  the  conditions  under  which  the  state  is  to  be 
considered  a final  state  and  the  form  of  the  constituent  to  be  returned. 

The  actions,  forms  and  tests  on  an  arc  may  be  arbitrary  functions  of 
the  register  contents.  Figure  5.1  presents  a useful  set  that  illustrates 
major  features  of  the  ATN.  The  first  three  actions  specified  in  Figure  5-1 
cause  the  contents  of  the  indicated  register  to  be  set  to  the  value  of  the 
indicated  form.  SETR  causes  this  to  be  done  at  the  current  level  of 
computation,  SENDR  at  the  next  lower  level  of  embedding,  so  that 
information  can  be  sent  down  during  a PUSH,  and  LIFTR  at  the  next  higher 
level  of  computation,  so  that  additional  information  can  be  returned  to 
higher  levels.  The  HOLD  action  places  a form  on  the  HOLD  list  to  be  used 
at  a later  place  in  the  computation  by  a VIR  arc.  SETF  provides  a means  of 
setting  a feature  of  the  constituent  being  built. 
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GETR  is  a function  whose  value  is  the  contents  of  the  named  register. 
LEX  is  a form  whose  value  is  the  current  input  symbol.  The  asterisk  (*)  is 
a form  whose  value  depends  on  the  context  of  its  use:  (1)  in  the  actions 
of  a CAT  arc,  the  value  of  * is  the  root  form  of  the  current  input  word; 
(2)  in  the  actions  of  a PUSH  arc,  it  is  the  value  of  the  lower 
computation;  and  (3)  in  the  actions  following  a VIR  arc,  the  value  of  it 
is  the  constituent  removed  from  the  HOLD  list.  GETF  is  a function  which 
determines  the  value  of  a specified  feature  of  the  indicated  form  (which  is 
usually  *).  BUILDQ  is  a general  structure-building  form  that  places  the 
values  of  the  given  registers  into  a specified  tree  fragment. 
Specifically,  it  replaces  each  occurrence  of  + in  the  tree  fragment  with 
the  contents  of  one  of  the  registers  (the  first  register  replacing  the 
first  occurrence  of  +,  the  second  register  the  second,  etc.).  In  addition, 
BUILDQ  replaces  occurrences  of  * by  the  value  of  the  form  *.  The  remaining 
tnree  forms  make  a list  out  of  the  specified  arguments  (LIST),  append  two 
lists  together  to  make  a single  list  (APPEND)  and  produce  as  a value  the 
(unevaluated)  argument  form  (QUOTE). 

Advantages  of  ATN  Formalism 

The  ATN  formalism  was  seriously  considered  at  the  beginning  of  the 
SOPhlE  project,  but  rejected  as  being  too  slow.  In  the  course  of 
developing  the  LISP  grammar,  it  became  clear  that  the  primary  reason  for  a 
significant  difference  in  speed  between  an  ATN  grammar  and  a LISP  grammar 
is  due  to  the  fact  that  processing  the  ATN  is  an  interpreted  process, 
whereas  LISP  is  compilable  and  therefore  the  time  problem  could  be  overcome 
by  building  an  ATN  compiler.  During  the  period  of  evolution  of  SOPHIE'S 
grammar,  an  ATN  compiler  was  constructed  (see  Burton  1976).  In  the  next 
section  we  will  discuss  the  advantages  we  hoped  to  gain  by  using  the  ATN 
formalism. 

These  advantages  fall  into  three  general  areas:  (1)  conciseness,  (2) 
conceptual  effectiveness  and  (3)  availabl*  facilities.  By  conciseness  we 
mean  that  writing  a grammar  as  an  ATN  takes  less  characters  than  LISP. 
The  ATN  formalism  gains  conciseness  hy  not  ri quiring  the  specification  of 
details  in  the  parsing  process  at  the  same  level  required  in  LISP.  Most  of 
these  differences  stem  from  the  fact  tnat,  the  ATN  assumes  it  has  a machine 
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whose  operations  are  designed  for  parsing,  while  LISP  assumes  it  has  a 
lambda  calculus  machine.  For  example,  a lambda  calculus  machine  assumes  a 
function  has  one  value.  A function  call  to  look  for  an  occurrence  at  a 
non-terminal  while  parsing  (in  ATN  formalism,  a PUSH)  must  return  at  least 
two  values:  the  structure  of  the  constituent  found,  and  the  place  in  the 

input  where  the  parsing  stopped.  A good  deal  of  complexity  is  added  to  the 
LISP  rules  in  order  to  maintain  the  free  variable  which  has  to  be 
introduced  to  return  the  structure  of  the  constituent.  Other  examples  of 
unnecessary  details  include  the  binding  of  local  variables  and  the 
specification  of  control  structure  as  ANDs,  ORs  and  CONDs . 

The  conciseness  of  the  ATN  results  in  a grammar  that  is  easier  to 
change,  easier  to  write  and  debug,  easier  to  understand,  and  hence  to 
communicate.  We  realize  that  conciseness  does  not  necessarily  lead  to 
these  results  (APL  being  a prime  example  in  computer  languages  mathematics 
in  general  being  another),  however,  this  is  not  a problem.  The 
correspondence  between  the  grammar  rules  in  LISP  and  ATN  is  very  close. 
The  concepts  which  were  expressed  as  LISP  code  can  be  expressed  in  nearly 
the  same  way  as  ATNs  but  in  fewer  symbols. 

The  second  area  of  improvement  deals  with  conceptual  effectiveness. 
Loosely  defined,  conceptual  effectiveness  is  the  degree  to  which  a language 
encourages  one  to  think  about  problems  in  the  right  way.  One  example  of 
conceptual  effectiveness  can  be  seen  by  considering  the  implementation  of 
case  structured  rules. (2M)  In  a typical  case  structure  rule,  the  verb 
expresses  the  function  (or  relation  name)  and  the  subject,  while  the 
object  and  prepositional  phrases  express  the  arguments  of  the  function  or 
relation.  Let  us  assume  for  the  purpose  of  this  discussion  that  we  are 
looking  at  four  different  cases  (agent,  location,  means,  and  time)  of  the 
verb  GO  — John  went  to  the  store  by  car  at  10  o'clock.  In  a phrase 
structure  rule-oriented  formalism  one  would  be  encouraged  to  write: 

<statement>  :=  <actor>  <action/verb>  <location>  <means>  <time> 

Since  the  last  three  cases  can  appear  in  any  order,  one  must  also  write  5 
other  rules: 

(2*0  See  Bruce  ( 1 <5 75 ) for  a discussion  of  case  systems. 
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<statement>  :=  <actor>  <act ion/verb>  <location>  <time>  <means> 


In  an  ATN  one  is  inclined  towards:  PUSH  location 


whicn  expresses  more  clearly  the  case  structure  of  the  rule.  There  is  no 

reason  why  in  the  LISP  version  of  the  grammar  one  couldn't  write  loops  that 

are  exactly  analogous  to  the  ATh  (the  ATN  compiler,  after  all,  produces 

such  code!),  however,  a rule-oriented  formalism  does  not  encourage  one  to 

think  this  way.  An  alternative  rule  implementation  is: 

<action>::  <actor><action/verb><actionl> 

<action1>:r  faction  1 Xtemporal> 

<action1>::  <action1Xlocation> 

<actiont>:=  <action1 Xmeans> 


this  is  easier 

( shorter ) 

to  write  but 

it  has 

the  disadvantage 

of 

being 

let t-recursive . 

To 

implement  it, 

one  is 

forced 

to  write 

the 

LISP 

equivalent  of 

the 

ATN 

that  creates  a dit 

ference 

between 

tne 

rule 

representation  and  the  actual  implementation.  This  method  also  has  the 
disadvantage  of  introducing  an  unmotivated  non-terminal. 

Another  conceptual  advantage  of  the  ATN  framework  is  that  it 
encourages  the  postponing  of  uecisions  about  a sentence  until  a 
differential  point  is  reached,  thereby  allowing  potentially  different  paths 
to  stay  together.  In  the  rule  oriented  SOPhlh  grammar  there  are  top  level 
rules  for  <set>,  a command  to  cnange  one  of  the  control  settings  and 
<modify>,  a command  to  fault  the  instrument  in  some  way.  Sentence  (1)  is  a 
<set>  ana  sentence  (2)  is  a Cmodifyv. 

(1)  Suppose  the  current  control  is  higri. 

(2)  Suppose  the  current  control  is  shorted. 
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The  two  parse  paths  for  these  sentences  should  be  the  same  for  the  first 
five  words,  but  they  are  separated  immediately  by  the  rules  <set>  and 
<modify> . (25)  An  ATN  encourages  structuring  the  grammar  so  that  the 
decision  between  <set>  and  <modify>  is  postponed  so  that  the  paths  remain 
together.  It  could  be  argued  that  the  fact  that  this  example  occurred  in 
SOPHIE's  grammar  is  a complaint  against  top-down  parsing  or  semantic 
grammars,  or  just  our  particular  instantiation  of  a semantic  grammar.  We 
suspect  the  latter  but  argue  that  rule  representations  encourages  this  type 
of  behavior. 

Another  conceptual  aid  provided  by  ATNs  is  their  method  of  handling 
ambiguity.  Our  LISP  implementation  uses  a recursive  descent  technique 
(which  can  alternatively  be  viewed  as  allowing  only  one  process).  This 
requires  that  any  decision  between  two  choices  be  made  correctly  because 
there  is  no  way  to  try  out  the  other  choice  after  the  decision  is  made.  At 
choice  points,  a rule  can,  of  course,  "look  ahead"  and  gain  information  on 
which  to  base  the  decision,  similar  to  the  "wait-and-see"  strategy  used  by 
Marcus  (1975)  but  there  is  no  way  to  back  up  and  remake  a decision  once  it 
has  returned. 

The  effects  of  this  can  be  most  easily  seen  by  considering  the  lexical 
aspects  of  the  parsing.  A prepass  collapses  compound  words,  expands 
abbreviations,  etc.  This  allows  the  grammar  to  be  much  simpler  because  it 
can  look  for  units  like  "voltage/control"  instead  of  having  to  decode  the 
noun  phrase  "voltage  control".  Unfortunately  without  the  ability  to  handle 
ambiguity,  this  rewriting  can  only  be  done  on  words  that  have  no  other 
possible  meaning.  So,  for  example,  when  the  grammar  is  extended  to  handle: 

(3)  Does  the  voltage  control  the  current  limiting  section? 

the  compound  "voltage/control"  would  have  to  be  removed  from  the  prepass 
rules  and  included  in  the  grammar.  This  reduces  the  amount  of  bcttom-up 
processing  that  can  be  done  and  results  in  a slower  parse.  It  also  makes 

(??)  The  degree  to  which  the  separation  oT  paths  is  a problem  can  Ee 
greatly  reduced  using  a preprocessing  "compilation"  state  such  as  Klovstad, 
which  (among  other  things)  collapses  rules  with  the  same  initial  parts.  In 
our  example,  however  this  may  not  work  since  the  phrase  "tne  current 
control"  may  be  parsed  as  the  non-terminal  <CONTROL>  in  (l)  and  as  the 
non-terminal  <PART>  in  (2).  Of  course  this  would  be  a poor  choice  of 

frammar  rules,  and  no  one  aware  of  sentences  (1)  and  (2)  would  handle  it 
his  way.  Tne  problem  is  recognizing  where  situations  such  as  this  occur. 
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compound  rules  difficult  to  write  Because  all  possiDle  uses  of  the 
indiviaual  words  must  be  considered  to  avoid  errors.  Another  example  is 
the  use  of'  the  letter  "C"  as  an  abbreviation.  Depending  on  context,  it 
could  possibly  mean  either  current,  collector  or  capacitor.  Without 
allowing  ambiguity  in  the  input,  it  could  not  be  allowed  as  an 
abbreviation  unless  explicitly  recognized  by  the  grammar. 

The  third  general  area  in  which  1.1  Ns  have  an  advantage  is  in  the 
available  facilities  to  deal  with  coir.j  Dx  linguistic  phenomena.  While  our 
grammar  has  not  yet  expanded  to  the  point  of  requiring  any  the 
facilities,  the  availability  of  such  facilities  cannot  be  ignored  as  an 
argument  favoring  one  approach  over  another.  A primary  example  is  the 
general  mechanism  for  dealing  with  coordination  in  English  described  in 
woods  (1973a). 

Conversion  to  (Semantic  A IN 

for  the  reasons  discussed  above  , the  SOPHIE  semantic  grammar  was 
re-written  in  the  ATN  formalism.  we  wish  to  stress  here  that  the 
re-writing  was  a process  of  changing  1 err.  inly.  The  content  of  the  grammar 
remained  the  same.  Since  a large  part  of  the  knowledge  encoded  Dy  the 
•■rammar  continues  to  be  semantic  it.  nature  , we  call  the  resulting  grammar  a 
"semantic  ATN".  figure  5.1  presents  the  graphic  ATN  representation  of  a 
.emanti  grammar  non-terminal.  This  if  tne  same  rule  presented  in  Figure 
w.1,  wmch  recognizes  the  ynrases  for  specifying  measurements  in  a circuit. 
The  actions  and  structure  building  operations  on  the  arcs  (which  are  not 
shown  in  figure  5.1)  save  the  recognized  constituents  and  construct  the 
proper  interpretation  when  sufficient  information  has  been  collected. 
Appendix  t provides  more  examples  of  the  semantic  ATN  used  in  SOPHIE. 

figure  5.2  presents  a simple  example  of  how  the  recognition  of 
anaphoric  deletions  can  be  captur-  : in  ATN  formalism.  The  network  in 
figure  5.2  encodes  the  straightforward  way  of  expressing  a terminal  of  a 
part  in  the  circuit  — the  bast  of  w1',  tne  anode  of  it,  the  collector.  Py 
the  state  TEKMINAL/TYPE,  both  the  determiner  and  the  terminal  type  --  base, 
anode  have  been  found.  The  first  arc  that  leaves  TERMINAL/TYPE  accepts  the 
preposition  that  begins  the  specification  of  the  part.  The  second  arc 
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(JUMP  arc)  corresponds  to  hypothesizing  that  the  specification  of  the  part 
has  been  deleted,  as  in:  "The  base  is  open."  The  action  on  the  arc  builds 
a place-holding  form  which  identifies  the  deletion  and  specifies  (from 
information  associated  with  the  terminal  type  which  was  found)  the  classes 
of  objects  that  can  fill  the  deletion.  The  method  for  determining  the 
referent  of  the  deletion  remains  the  same  as  described  in  Chapter  4. 

Figure  5.2 

An  ATN  which  recognizes  deletion 


The  SOPHIE  semantic  ATN  is  then  compiled  using  the  general  ATN 
compiling  system  described  in  Burton  (1976).  The  SOPHIE  grammar  provides 
the  compiling  system  with  a good  contrast  to  the  LUNAR  grammar,  since  it 
does  not  use  many  of  the  potential  features.  In  addition,  a bench  mark,  of 
sorts,  was  available  from  the  LISP  Implementation  of  the  grammar  that  could 
be  used  to  determine  the  computational  cost  of  using  the  ATN  formalism. 

There  were  two  modifications  made  to  the  compiling  system  to  improve 
its  efficiency  for  the  SOPHIE  application.  In  the  SOPHIE  grammar,  a large 
number  of  the  arcs  check  for  the  occurrence  of  particular  words.  When 
there  is  more  than  one  arc  leaving  a state,  the  ATN  formalism  requires 
that  all  of  these  arcs  be  tried,  even  if  more  than  one  of  these  is  a WRD 
arc  and  an  earlier  WRD  arc  has  succeeded.  This  is  especially  costly,  since 
the  taking  of  an  arc  requires  the  creation  of  a configuration  to  try  the 
remaining  arcs.  In  those  cases  when  it  is  known  that  none  of  the  other 
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arcs  can  succeed,  this  should  be  avoided.  As  a solution  to  this  problem, 
the  GROUP  arc  type  was  added.  The  GROUP  arc  allows  a set  of  contiguous 
arcs  to  be  designated  as  mutually  exclusive.  The  form  of  the  GROUF  arc  is: 
(GROUP  arcl  arc2  ...  arcn).  The  arcs  are  tried,  one  at  a time,  until  the 
conditions  on  one  of  the  arcs  are  met.  This  arc  is  then  taken,  and  the 
remaining  arcs  in  the  GROUF  are  forgotten  --  not  tried.  If  a PUSH  arc  is 
included  in  the  GROUF,  it  will  be  taken  if  its  test  is  true  and  the 
remaining  arcs  will  not  be  tried  even  if  the  rUSHed  for  constituen  is  not 
found.  For  example,  consider  the  following  grammar  state: 

(S/1 


(GROUF  (CAT 

A T 

(TO  S/2)) 

( WRD 

X T 

(TO  S/3)) 

(CAT 

B T 

(TO  S/4)) 

At  most,  one  of  the  three  arcs  will  be  followed.  Without  GROUPing  them 
together,  it  ,is  possible  that  all  tnree  might  be  followed  --  if  the  word  X 
had  interpretations  as  both  category  A and  category  B. 

The  GROUP  arc  also  provides  an  efficient  means  of  encoding  optional 
constituents.  The  normal  method  of  allowing  options  in  ATN  is  to  provide 
an  arc  tha.  accepts  the  optional  constituent  and  a second  arc  that  jumps  to 
the  next  state  without  accepting  anything.  For  example,  if  in  state  s/2 
the  word  "very"  is  optional,  the  following  two  arcs  would  be  created: 

(S/2 

(WRU  VERY  T (TO  REST-OF-S/2 ) ) 

(JUMP  REST-0r-S/2  T) ) 

The  inefficiency  arises  when  the  word  "very"  does  occur.  The  first  arc  is 
taken,  but  an  alternative  configuration  that  will  try  the  second  arc  must 
be  created,  and  possibly  later  explored.  By  embedding  these  arcs  in  a 
GROUP,  the  alternative  will  not  be  created  thus  saving  time  and  space.  As 
a result,  it  won't  have  to  be  explored,  possibly  saving  more  time.  A 
warning  should  be  included  here,  that  the  GROUP  arc  can  reject  sentences 
that  might  otherwise  be  accepted.  In  our  example,  "very"  may  be  needed  to 
get  out  of  the  state  REST-OF-S/2.  In  this  respect,  the  GROUP  arc  is  a 
departure  from  the  original  ATN  philosopny  that  arcs  should  be  independent, 
and  for  this  we  apologize.  However,  for  some  applications,  the  increased 
efficiency  can  be  critical. 
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The  other  change  to  the  compiling  system  (for  the  semantic  grammar 
application)  dealt  with  the  preprocessing  operations.  The  preprocessing 
facilities  described  in  the  last  chapter  Included:  1)  lexical  analysis  to 
extract  word  endings;  2)  a substitution  mechanism  to  expand  abbreviations; 
delete  noise  words,  and  canonicalize  synonyms;  3)  dictionary  retrieval 
routines;  and  4)  a compound  word  mechanism  to  collapse  multi-word  phrases. 
For  the  SOPHIE  application  we  added  the  ability  to  use  the  INTERLISP 
spelling  correction  routines  and  the  ability  to  derive  word  definitions 
from  SOPHIE'S  semantic  net.  The  extraction  of  definitions  from  the 
semantic  network  for  part  names  and  node  names  reduces  the  size  of  the 
dictionary  and  simplifies  the  operations  of  changing  circuits.  In 
addition,  a mechanism  called  MULTIPLES  was  developed  that  permits  string 
substitution  within  the  input.  This  is  similar  to  the  notion  of 
compounding,  but  differs  in  that  a compound  rule  creates  an  alternative 
lexical  item  while  the  multiple  rule  creates  a different  lexical  item. 
After  the  application  of  a compound  rule,  there  is  an  additional  edge  in 
the  input  chart;  after  a multiple  rule,  the  effect  is  the  same  as  if  the 
user  had  typed  in  a different  string. 

Fuzziness 

The  one  aspect  of  the  LISP  implementation  that  has  not  been 
incorporated  into  the  ATN  framework  is  fuzziness,  the  ability  to  ignore 
words  in  the  input.  While  we  h.  not  worked  out  the  details,  the 
non-determinism  provided  by  ATNs  lends  itself  tu  an  interesting  approach. 
In  a one-process  — recursive  descent  --  implementation,  the  rule  that 
checks  for  a word  must  decide  (with  information  passed  down  from  higher 
rules)  whether  to  try  skipping  a word,  or  give  up.  The  critical 
information  that  is  not  available  when  this  decision  has  to  be  made  is 
whether  or  not  there  is  another  parse  that  would  use  that  word.  In  the 
ATN,  it  is  possible  to  suspend  a parse  and  come  back  to  it  after  all  other 
paths  have  been  tried.  Fuzziness  could  be  implemented  so  that  rather  than 
skip  a word  and  continue,  it  can  skip  a word  and  suspend,  waiting  for  the 
other  parses  to  fail  or  suspend.  The  end  effect  may  well  be  that  sentences 
are  allowed  to  get  fuzzier  because  there  is  no  danger  of  missing  the 
correct  parse. 
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Comparison  of  Results 


The  original  motivation  for  changing  to  the  ATN  was  its  perspicuity. 
Appendices  A and  B show  the  BNF/LISP  version,  which  can  be  compared  with 
Appendix  E,  that  shows  the  ATN  version.  We  suspect  that  the  reader  will 
find  that  neither  of  them  are  particularly  readable,  but  then  there  is  no 
reason  to  expect  that  this  should  be  the  case.  As  Winograd  (1973)  has 
pointed  out,  simple  grammars  are  perspicuous  in  almost  any  formalism; 
complex  grammars  are  still  complex  in  any  formalism.  We  found  the  ATN 
formalism  much  easier  to  think  in,  write  in,  and  debug.  The  examples  of 
redundant  processing  that  were  presented  earlier  in  this  chapter  were 
discovered  while  converting  to  ATN.  For  a gross  comparison  on  conciseness, 
the  ATN  grammar  requires  70J  less  characters  to  express  than  the  LISP 
version . 

The  efficiency  results  were  surprising.  Table  5-1  gives  comparison 
timings  between  the  LISP  version  and  the  ATN  compiled  version.  As  can  be 
seen,  the  ATN  version  is  more  than  twice  as  fast.  This  was  pleasantly 
counter-intuitive,  as  we  expected  the  LISP  version  to  be  much  faster  due  to 
the  amount  of  hand  optimization  that  had  been  done  while  encoding  the 
grammar  rules.  In  presenting  the  comparison  timing,  it  should  be  mentioned 
that  there  are  three  differences  between  the  two  systems  that  tended  to 
favor  the  ATN  version. (26)  One  difference  was  the  lack  of  fuzziness  in  the 
ATN  version.  The  LISP  version  spent  time  testing  words  other  than  the 
current  word,  looking  ahead  to  see  if  it  were  possible  to  skip  this  word, 
which  was  not  done  in  the  ATN  version.  The  second  is  the  creation  of 
categories  for  words  during  the  preprocessing  in  the  ATN  version  that 
reduced  the  amount  of  time  spent  accessing  the  semantic  net  and  hence 
reduced  the  time  required  to  perform  a category  membership  test  in  the  ATN 
system.  The  third  was  the  simplification  of  the  grammar  and  increase  in 
the  amount  of  bottom-up  processing  that  could  be  done  because  of  the 
ambiguity  allowed  in  the  input  chart.  In  our  estimation,  the  lack  of 
fuzziness  is  the  only  difference  that  may  have  had  a significant  effect, 

'■?*.)  Th~— Tjoi  1 extent  to  which  each  of  these  differences  contributed  Ts 
difficult  to  gather  statistics  on  due  to  the  block  compiler  which  gains 
efficiency  by  hiding  internal  workings.  The  exact  contribution  of  each 
could  certainly  be  determined  but  was  not  deemed  worth  the  effort. 
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and  this  can  be  included  explicitly  in  the  ATN  in  places  where  it  is 
critical,  by  using  TST  arcs  and  suspend  actions,  without  a noticeable 
increase  in  processing  time.  In  conclusion,  we  are  very  pleased  with  the 
results  of  the  compiled  semantic  ATN  and  feel  that  the  ATN  compiler  makes 
the  ATN  formalism  computationally  efficient  enough  to  be  used  in  real 
systems . 


Table  5.1 

Comparison  of  ATN  vs  LISP  Implementation 
Times  (in  seconds)  are  "prepass"  + "parsing" 


1 ) what  is  the  output  voltage? 

LISP  - .024  + .018  = .042 
ATN  - .048  + .033  = .08 1 

2)  what  is  the  voltage  between  there  and  the  base  of  06? 

LISP  - .038  + .039  = .077 
ATN  - .090  + .04b  r .136 

3)  Q5? 

LISP  - .010  + .046  = .056 
ATN  - .01  3 + -060  r .07 3 

4)  what  is  the  output  voltage  when  the  voltage  control  is  set  to  .5? 

LISP  - .045  + .038  r .083 
ATN  - .09b  + .046  = .144 

5)  If  Q6  has  an  open  emitter  and  a shorted  base  collector  junction  what 

happen:  to  the  voltage  between  its  base  and  the  junction  of  the  voltage 

limiting  section  and  the  voltage  reference  source? 

LISP  - .206  + . 186  = .394 
ATN  - .259  + .090  = .349 
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Chapter  6 

OBSERVATIONS  ON  STUDENT  USAGE 

When  we  began  developing  a natural  language  processor  for  an 
instructional  environment,  we  knew  it  had  to  be  (1)  fast,  (2)  habitable, 
and  (3)  self-teaching.  The  basic  conclusion  that  has  arisen  from  the  work 
presented  here  is  that  it  is  possible  to  satisfy  these  constraints.  The 
notion  of  semantic  grammar  (presented  in  Chapter  4)  provides  a paradigm  for 
organizing  the  knowledge  required  in  the  understanding  process  that  permits 
efficient  parsing.  In  addition,  semantic  grammar  aids  the  habitability  by 
providing  insights  into  a useful  class  of  dialogue  constructs,  and  permits 
efficient  handling  of  such  phenomena  as  pronominalizations  and  ellipses. 
The  need  for  a better  formalism  for  expressing  semantic  grammars  led  to 
the  use  of  Augmented  Transition  Networks  (presented  in  Chapter  5).  The 
ability  of  the  ATN-expressed  semantic  grammar  to  satisfy  the  above  stated 
requirements  is  demonstrated  in  the  natural  language  front-end  for  the 
SOPHIE  system. 

A point  that  needs  to  be  stressed  i3  that  the  SOPHIE  system  has  been 
(and  is  being)  used  by  uninitiated  students  in  experiments  to  determine  the 
pedagogical  effectiveness  of  its  environments.  While  much  has  been  learned 
about  the  problems  of  using  a natural  language  interface,  these  experiments 
were  not  "debugging"  sessions  for  the  natural  language  component.  The 
natural  language  component  has  unquestionably  reached  a state  at  which  it 
can  be  conveniently  used  to  facilitate  learning  about  electronics.  In  this 
chapter,  we  will  describe  the  experiences  of  students  using  the  natural 
language  component,  and  present  some  ideas  on  handling  erroneous  inputs. 

Impressions  Experience?  an.d  L^geryatjons 

Prior  to  any  exposure  to  SOPHIE,  a group  of  four  students  were  asked 
to  write  down  all  of  the  ways  they  could  think  of  for  requesting  the 
voltage  at  a particular  node.  Although  the  intent  of  the  experiment  was  to 
determine  the  range  of  paraphrases  that  students  might  be  inclined  to  use 
before  they  were  aware  of  the  system's  linguistic  limitations,  a more 
interesting  result  emerged.  Each  student  wrote  down  one  phrasing  very 
quickly  but  had  a difficult  time  thinking  of  a second,  even  though  the 
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initial  phrasing  by  three  of  the  students  were  in  fact  different!  One 
student  quit,  exclaiming  "But  there  is  only  one  way  to  ask  that!"  This 
same  inability  to  perform  linguistic  paraphrase  carried  over  to  the  actual 
interaction  with  SOPHIE  via  terminal.  Whenever  the  system  did  net  accept  a 
query,  there  was  a marked  delay  before  the  student  tried  again.  Sometimes 
the  student  would  abandon  his  line  of  questioning  completely.  At  the  same 
time,  data  collected  over  many  sessions  indicated  that  there  was  no 
standard  — canonical  — way  to  phrase  a question.  Table  6.1  provides  some 
examples  of  the  range  of  phrasings  used  by  students  to  ask  for  the  voltage 
at  a node. 


Table  6.1 

Sample  Student  Inputs 


The  following  are  some  of  the  input  lines  typed  by  students  with  the  intent 
of  discovering  the  voltage  at  a node  in  the  circuit. 

What  is  the  voltage  at  node  1? 

What  is  the  voltage  at  the  base  of  Q5? 

How  much  voltage  at  N10? 

And  what  is  the  voltage  at  N1? 

N9? 

V at  the  neg  side  of  C6? 

VII  is? 

What  is  the  voltage  from  the  base  of  transistor  Q5  to  ground? 

What  V at  N 1 6? 

Coll,  of  U5? 

Node  16  Voltage? 

What  is  the  voltage  at  pin  1? 

Output? 


As  Table  6.1  shows,  students  are  likely  to  conceive  of  their  questions  in 
many  ways  and  to  express  each  of  these  conceptions  in  any  of  several 
phrasings.  Yet  other  experiences  indicate  that  they  lack  the  ability  to 
easily  convert  to  another  conceptualization  or  phrasing.  Since  the 
non-acceptance  of  questions  creates  a major  interruption  in  the  student's 
thought  process,  the  acceptance  of  many  different  paraphrases  is  critical 
to  maintaining  flow  in  the  student's  problem  solving. 

Another  interesting  phenomenon  that  occurred  during  sessions  was  the 
change  in  the  linguistic  behavior  of  the  students  as  they  used  the  system. 
Initially,  queries  were  stated  as  complete  English  questions,  generally 
stated  in  templates  created  by  the  students  from  the  written  examples  of 
sessions  that  we  had  given  them.  If  they  needed  to  ask  something  that  did 
not  exactly  fit  one  of  their  templates,  they  would  try  a minor  variant.  As 
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they  became  more  familiar  with  the  mode  of  interaction,  they  began  to  use 

abbreviations,  to  leave  out  parts  of  their  questions  and,  in  general,  to 

assume  that  the  system  was  following  their  interaction.  After  five  hours 
of  experience  with  the  system,  almost  all  of  one  student's  queries 
contained  abbreviations  and  one  in  six  depended  on  the  context  established 
by  previous  statements. 

REEDfcACK  - When  the  Grammar  Fails 

From  our  experiences  with  students  using  SOPHIE,  we  have  been 
impressed  with  the  importance  of  providing  feedback  to  unacceptable  inputs 
--  what  to  do  when  the  system  doesn't  understand  an  input  — . While  it  may 
appear  that  in  a completely  habitable  system  all  inputs  would  be 
understood,  no  system  has  ever  attained  this  goal  and  none  will  in  the 

foreseeable  future.  To  be  natural  to  a naive  user,  an  intelligent  system 

should  act  intelligently  when  it  fails  too.  The  first  step  towards  having 
a system  fail  intelligently  is  the  identification  of  possible  areas  of 
error.  In  student's  use  of  the  SOPHIE  system,  we  have  found  the  following 
types  of  errors  to  be  common: 

(1)  Spelling  errors  and  mis-typings  - "Shortt  the  CE  og  Q3  and  opwn  its 
base”;  "What  isthe  vbe  05." 

(2)  Inadvertent  omissions  - "What  is  the  EE  of  05?"  (The  user  left  out  the 
quantity  to  measure.  Note  that  in  other  contexts  this  is  a well  formed 
question . ) 

(3)  Slight  misconceptions  that  are  predictable  - "What  is  the  output  of 

transistor  Q3?"  (The  output  of  a transistor  is  not  defined);  "What 
is  the  current  thru  node  1?"  (Nodes  are  places  where  voltage  is 
measured  and  may  have  numerous  wires  associated  with  them);  "What  is 
R9?"  ( R9  is  a resistor):  "Is  Q5  conducting?"  (The  laboratory  section 

of  SOPHIE  gives  information  that  is  directly  available  from  a real  lab 
such  as  currents  and  voltages.) 

(A)  Gross  misconceptions  whose  underlying  meaning  is  well  beyond  designed 
system  capabilities  - "Make  the  output  voltage  30  volts":  "Turn  or.the 

power  supply  and  tell  me  how  the  unit  functions";  "What  time  is  it?". 

The  best  technique  for  dealing  with  each  type  of  error  is  an  open  problem. 
In  the  remainder  of  this  section,  we  will  discuss  the  solutions  used  in  the 
SOPHIE  system  to  provide  feedback. 

The  use  of  a spelling  correction  algorithm  (borrowed  from  INTERLISP) 
has  proven  to  be  a satisfactory  solution  to  type  1 errors.  During  one 
student's  session,  spelling  correction  was  required  on,  and  resulted  in 
proper  understanding  of,  10?  of  the  questions.  The  major  failings  of  the 
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INTERLISP  algorithm  are  the  restriction  on  the  size  of  the  target  set  of 
correct  words  (time  increases  linearly  with  the  number  of  words)  and  its 
failure  to  correct  run-on  words.  (The  time  required  to  determine  if  a word 
may  be  two  (possibly  misspelled)  words  run  together  increases  very  quickly 
with  the  length  of  the  word  and  the  number  of  possibly  correct  words.  With 
no  context  to  restrict  the  possible  list  of  words,  the  computation  involved 
is  prohibitive.)  A potential  solution  to  both  shortcomings  would  be  to  use 
the  context  of  the  parser  to  reduce  the  possibilities  when  it  reaches  the 
unknown  word.  Because  of  the  nature  of  the  grammar,  this  would  allow 
semantic  context  as  well  as  syntactic  context  to  be  used. 

Of  course,  the  use  of  any  spelling  correction  procedure  has  some 
dangers.  A word  that  is  spelled  correctly  but  that  the  system  doesn't  know 
may  be  changed  through  spelling  correction  to  a word  the  system  does  know. 
For  example  if  the  system  doesn't  know  the  word  "top"  but  does  know  "stop", 
a user's  command  to  "top  everything"  can  be  disastrously  misunderstood. 
For  this  reason,  words  like  "stop"  are  not  spelling  corrected. 

Our  solution  to  predictable  misconceptions  (type  3 errors)  is  to 
recognize  them  and  give  error  messages  that  are  directed  at  correcting  the 
misconception.  We  are  currently  using  two  different  methods  of 
recognition.  One  is  to  loosen  up  the  grammar  so  that  it  accepts  plausible 
but  meaningless  sentences.  This  technique  provides  the  procedural 
specialists  called  by  the  plausible  parse  enough  context  to  make  relevant 
comments.  For  example,  the  concept  of  current  through  a node  is  accepted 
by  the  grammar  even  though  it  is  meaningless.  The  specialist  that  performs 
measurements  must  then  check  its  arguments  and  provide  feedback  if 
necessary : 

>>  WHAT  IS  THE  CURRENT  THRU  NODE  4? 

The  current  thru  a node  is  not  meaningful  since  by  Kirchoff's  law 
the  sum  of  the  currents  thru  any  node  is  zero.  Currents  can  be 
measured  thru  parts  (e.g.  CURRENT  THRU  C6)  or  terminals 
(e.g.  CURRENT  THRU  THE  COLLECTOR  OF  02). 

Notice  that  the  response  to  the  question  presents  some  examples  of  how  to 
measure  the  currents  along  wires  that  lead  into  the  mentioned  node. 
Examples  of  questions  that  will  be  accepted  and  are  relevant  to  the 
student's  needs  are  among  the  best  possible  feedback. 
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The  second  method  of  recogn i zi  ng  common  misconceptions  is  to  "key" 
feedback  off  single  words  or  groups  of  words.  In  the  following  examples, 
the  "keys"  are  "or"  and  "turned  on".  Notice  that  the  response  presents  a 
general  characterization  of  the  violated  limitations  as  well  as  suggestions 
for  alternative  lines  of  attack. 

>>  COULD  01  OH  02  BE  SHORTED? 

I can  only  handle  one  question,  hypothesis,  etc.  at  a time.  The  fact 
that  you  say  'OH'  indicates  that  you  may  be  trying  to  express  two 
concepts  in  the  same  sentence.  Maybe  you  can  break  your  statement 
into  two  or  more  simple  ones. 

>>  IS  THE  CURRENT  LIMITING  TRANSISTOR  TURNED  ON? 

The  laboratory  section  of  SOPHIE  is  designed  to  provide  the  same 
elementary  measurements  that  would  be  available  in  a real  lab.  If  you 
want  to  determine  the  state  of  a transistor,  measure  the  pertinent 
currents  and  voltages. 

These  methods  of  handling  type  3 errors  have  proved  to  be  very  helpful. 
However,  they  require  that  all  of  the  misconceptions  must  be  predicted  and 
programmed  for  in  advance.  This  limitation  makes  them  inapplicable  to 
novel  situations. 

The  most  severe  problems  a user  has  stem  from  type  2 (omissions)  and 
type  k (major  misconceptions)  errors.  (Type  3 errors  that  haven't  been 
predicted  are  considered  type  k errors.)  After  a simple  omission,  the  user 
may  not  see  that  he  has  left  anything  out  and  may  conclude  that  the  system 
doesn't  know  that  concept  or  phrasing  of  that  concept.  Ror  example  when 
the  user  types  "What  is  the  BE  of  U5"  instead  of  "What  is  the  VBR  of  05?", 
he  may  decide  that  it  is  unacceptable  because  the  system  doesn't  allow 
"VfcE"  as  an  abbreviation  of  "base  emitter  voltage".  For  type  k errors,  the 
user  may  waste  a lot  of  time  and  energy  attempting  several  rephrasings  of 
his  query,  none  of  which  can  oe  understood  because  the  system  doesn't  know 
the  concept  the  user  is  trying  to  express.  For  example,  no  matter  how  it 
is  phrased,  the  system  won't  understand  "Make  the  output  voltage  30  volts" 
because  measurements  cannot  be  directly  changed,  only  controls  and 
specifications  of  parts  can  be  changed. 

The  feedback  necessary  to  correct  both  of  these  classes  of  errors  must 
identify  any  concepts  in  the  statement  that  are  understood  and  suggest  the 
range  of  things  that  can  be  done  to/with  these  concepts.  For  type  2 
errors,  this  will  help  the  user  see  his  omission.  For  type  k errors,  it 
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may  suggest  alternative  conceptualizations  that  will  allow  the  user  to  get 
at  the  same  information  (for  example,  to  change  the  output  voltage 
indirectly  by  changing  one  of  the  controls)  or  at  least  provide  him  with 
enough  information  to  decide  when  to  quit. 

The  notion  of  semantic  grammar  may  be  useful  in  developing  a general 
solution  along  the  following  lines:  A bottom-up  or  island  parsing  scheme 
could  be  used  to  identify  well-formed  constituents. (27)  Since  the  grammar 
is  semantically  based,  the  constituents  that  are  found  represent  "islands" 
of  meaningful  phrases.  The  ATN  representation  of  the  semantic  grammar  can 
then  be  inspected  to  discover  possible  ways  of  combining  these  islands.  If 
a good  match  is  found,  the  grammar  can  be  used  to  generate  a response  that 
indicates  what  other  semantic  parts  are  required  for  that  rule.  Even  if  no 
good  matches  are  found,  a positive  statement  may  be  made  that  explains  the 
set  of  possible  ways  the  recognized  structures  could  be  understood.  Much 
more  work  is  required  in  the  area  of  unacceptable  inputs  before  natural 
language  systems  will  feel  really  natural  to  naive  users. 


(27)  William  Woods  and  Geoff  flrown  are  presently  refining  such  a bottom-up 
parsing  technique  for  ATN  grammars  for  use  in  tne  BBN  Speech  project  (Woods 
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Chapter  7 

CONCLUDING  DISCUSSION 


Future  Research  Areas 

The  SOFHIK  semantic  grammar  system  is  designed  for  a particular 
context  --  trouble  shooting  --  within  a particular  domain,  namely, 
electronics.  It  represents  the  compilation  of  those  pieces  of  knowledge 
which  are  general  (linguistic)  together  with  specific  domain  dependent 
knowledge.  In  its  present  form,  it  is  unclear  which  knowledge  belongs  to 
which  area.  The  development  of  semantic  grammars  for  other  applications 
and  extensions  to  the  semantic  grammar  mechanism  to  include  other 
understood  linguistic  phenomena  will  clarify  this  distinction. 

While  the  work  presented  in  this  report  has  dealt  mostl  on  one  area 
of  application,  the  notion  of  semantic  grammar  as  a method  of  integrating 
knowledge  into  the  parsing  process  has  wider  applicability.  Two 
alternative  applications  of  the  technique  have  been  completed.  One  deals 
with  simple  sentences  in  the  domain  of  attribute  blocks  (Brown  et  al. 
1975).  While  the  sublanguage  accepted  in  the  attribute  blocks  environment 
is  very  simple,  it  is  noteworthy  that  within  the  semantic  grammar  paradigm, 
a simple  grammar  was  quickly  developed  that  greatly  improved  the 
flexibility  of  the  input  language.  The  other  completed  application  deals 
with  questions  about  the  editing  system  NLS  (Grignetti  et  al . 1975).  In 
this  application,  most  questions  dealt  with  editing  commands  and  their 
arguments,  and  fit  nicely  into  the  case  frame  notion  mentioned  in  Chapter 
5.  The  case  frame  use  of  semantic  grammar  is  being  considered  for,  and  may 
have  its  greatest  impact  on,  command  languages.  Command  languages  are 
typically  case  centered  around  the  command  name  that  requires  additional 
arguments  (its  cases).  The  combination  of  the  semantic  classification 
provided  by  the  semantic  grammar  and  the  representation  of  case  rules 
permitted  by  ATNs  should  go  a long  way  towards  reducing  the  rigidity  of 
complex  command  languages  such  as  those  required  for  message  processing 
systems.  The  combination  should  also  be  a good  representation  for  natural 
language  systems  in  domains  where  it  is  possible  to  develop  a strong 
underlying  conceptual  space,  such  as  management  information  systems 
(Malhotra  1975). 
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The  extension  of  the  semantic  grammar  to  incorporate  existing 
linguistic  processing  techniques  is  another  potentially  fruitful  research 
area.  One  of  the  ways  semantic  grammar  gains  efficiency  is  to  separate  the 
processing  of  syntactically  similar  sentences  on  semantic  grounds  when  it 
is  useful  to  do  so.  However,  this  prevents  the  uniform  incorporation  of, 
for  example,  Woods'  (1973b)  solution  to  the  problems  of  relative  clause 
modification,  quantifiers  and  conjunction.  One  means  of  integrating  these 
techniques  would  be  to  develop  an  intermediate  target  language  that 
maintains  the  advantages  o.'  the  semantic  grammar  approach  while  allowing 
uniform  solutions  to  other  problems.  It  may  * even  be  possible  to  adopt 
Woods'  query  language,  allowing  the  semantic  grammar  to  dictate  the 
functions  within  the  "propositions"  and  "commands".  An  alternative  attack 
would  be  to  use  a "syntactic"  processing  phase,  incorporating  the  desired 
techniques  that  canonicalizes  the  input  before  it  is  processed  by  the 
semantic  grammar.  In  this  method,  the  semantic  grammar  would  be  viewed  as 
an  interpretation  phase  of  the  understanding  process,  but  which  works  on  a 
much  less  structured  syntactic  parse  than,  for  example,  the  LUNAR  system. 

CONCLUSIONS 

In  the  course  of  this  report,  we  have  described  the  evolution  of  a 
natural  language  front-end  from  keyword  beginnings  to  a system  capable  of 
using  complex  linguistic  knowledge.  The  guiding  strand  has  been  the 
utilization  of  semantic  information  to  produce  efficient  natural  language 
processors.  There  were  several  highlights  that  represent  noteworthy  points 
in  the  spectrum  of  useful  natural  language  systems.  Toward  the  keyword  end 
of  the  scale,  the  procedural  encoding  technique  with  fuzziness  (Chapter  4 
and  Appendix  B)  allows  simple  natural  language  input  to  be  accepted  without 
introducing  the  complexity  of  a new  formalism.  Encoding  the  rules  as 
procedures  allowed  flexible  control  of  the  fuzziness  and  the  semantic 
nature  of  the  rules  provides  the  correct  places  to  take  advantage  of  the 
flexibility.  As  the  language  covered  by  the  system  becomes  more  complex, 
the  additional  burden  of  a grammar  formalism  will  more  than  pay  for  itself 
in  terms  of  ease  of  development  and  reduction  in  complexity.  The  ATN 
compiling  system  allows  for  the  consideration  of  the  ATN  formalism  by 
reducing  its  runtime  cost,  making  it  comparable  to  a direct  procedural 
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encoding.  The  natural  language  front  end  now  used  by  SOPH  IK  is  constructed 
by  compiling  a semantic  ATN.  As  the  linguistic  complexity  of  the  language 
accepted  by  the  system  increases,  the  need  for  more  syntactic  knowledge  in 


the  grammar  becomes  greater.  Unfortunately , this  often  works  at  cross 
purposes  with  the  semantic  character  of  the  grammar.  It  would  be  nice  to 
have  a general  grammar  for  English  syntax  that  could  be  used  to  preprocess 
sentences;  however,  one  is  not  forthcoming.  A general  solution  to  the 
problem  of  incorporating  semantics  with  the  current  state  of  incomplete 
knowledge  of  syntax  remains  an  open  research  problem.  In  the  foreseeable 
future,  any  system  will  have  to  be  an  engineering  trade-off  between 
complexity  and  generality  on  one  hand  and  efficiency  and  habitability  on 
the  other.  We  have  presented  several  techniques  that  are  viable  bargains 
in  this  trade-off. 
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Appendix  A 


BNP  Ue3cript ion  of  Part  of  t he 
SOPHIE  Semantic  Grammar 


This  appendix  Rives  a BNP-like  description  of  part  ol  the  language 
accepted  by  SOPHIE.  Included  are  all  of  the  rules  necessary  to  parse  a 
"measurement".  Examples  of  "measurements"  are  "voltage  at  N1".  "base 
emitter  current  of  05",  and  "output  voltage".  The  grammar  is  implemented 
as  LISP  functions  and  an  example  is  listed  in  Appendix  B. 

In  the  description,  alternatives  on  the  right-hand  side  are  separated 
by  ! or  are  listed  on  separate  lines.  Brackets  []  enclose  optional 
• lements.  An  asterisk  * is  used  to  mark  notes  about  a particular  rule. 
Non-t erm inal s are  designated  by  names  enclosed  in  angle  brackets  <>. 


The  Grammar 

<circuit /place> : = <terminal>  ! <node> 

<diode/spec>  :=  <diode>  ! <zener/d iode> 

<sect ion>  diode  ! <sect ion>  zener/diode 

<jund  ion>  :=  < junct ion/t ype>  [of]  <t  ransist  or/spec> 

<t ransist or /lerm/t ype>  and  <trans ist or/t erm/t ype>  [of] 
[ <t  ransist  or/ spec > ] 

<t ransistor/t erm/t ype>  to  <t ransist or/t erm/type>  [of] 

[ <t ransist or/ spec > ] 

junct  ion/type>  : = eb  ! be  ! ec  ! ce  ! cb  ! be 

<me as/quant > :r  voltage  ! current  ! resistance*  ! power 
•means  measured  resistance 

<measurement > :r  < sect ion> L out  put *][ <meas/quant > ] 

output*  <meas/quant>  [of]  <section> 
output*  [ <meas/quant > j [of  <t ransformer> ] 

<t ransformer>  <meas/quant> 

<meas/quant > between**  <circuit /place>  and* 
<circuit/place> 

<meas/quant>  of***  <part/spec> 

<meas/quant>  between  output  terminals 
<meas/quant>  of  < junct ion> 

<meas/quant>  of  <circuit/place> 

<meas/quant > from  < junct ion> 

<meas/quant>  of  <sect ion> 

<meas/quant>  of  <pronoun> 

< junct ion/type>  <meas/quant>  [of  < t rans ist or/spec> ] 
<t ransistor/term/type>  <meas/quant > of 
[ < t rans ist or/ spec > ] 

•input  also 
**from-to  also  works 

***at  , thru,  in,  into,  across  and  through  also  work 

<riode>  junction  of  <part/spec>  and  <part/spec> 
node  oetween  <sect  ion>  and  <sect ion> 

[point]  between  <part/spec>  and  <part/spec> 

<node/name>  ! [node]  <node/number> 

<pronoun> 

<num/spec>  :=  "any  positive  number"  Ik]  ! one 

■part  spec>  <part/name>  ! <load/spec>  ! <sect ion>  <part/type> 
<pronoun> 
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< j i' t / spec ■>  : ; ! ve  ! c ct 

<pronoun>:=  :t  ! I that  I "type" 

vt « nr  in  1 1 ">  • ut  put  t ertr.inal  i ! <1  rar  . -t  * r/t  <t"  • ! center/t  ap 

positive  terminal  (<part  / p<  >J  ! p<  itive  one 

negative  terminal  I < part /;'.[.■*  ! m >’  it  i v»  one 

I anode  l <diode/spec>]  ! cat  hod*  |*d:  <le/spec>] 

wiper  [ <pot /spec>  ] 

trai  istor  p<  ► :=  <trat  .’;■■!■  se  t iot  t ransistor  ! <pronoun> 

t 

<t  ran:  ..'t.r  term)  : = <t  ranr  i ot  or/term/t  ypr  * ; t rans  ist  or/spec>  ] 

t ran:  iot  >r/  term/t  ype>  : = bane  ! collector  ! «*m  it  t <-r 

<t  ransittt  r>  , <oapacit or> , <diode>  t <r*  i rt  *»r> , transformer'  and 
<zet:er/diooe>  all  hick  t he  remant  ic  network  and  parse  correct  part  names, 
t'.P.  r9,  q6. 

<re  t i n>  uses  t ne  semant  ic  network  t * det  er"'  im  if  a word  is  a sect  ion  of 
the  un;t,  e.f.  current /I imit er , 

< part /name  > uses  t he  semantic  network  to  ne<  .1  a word  is  the  name  of  a 
part  e ,p . rb , c« , t 2, 

<t.  )de/nante>  checks  semant  ic  r.et  work  for  h t.  o-<  s. 
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Appendix  H 

A LISP  Rule  f rom  t he  Semant ic  Grammar 

This  appendix  describes  the  method  of  encoding  the  grammar  as  LISP 
procedures.  The  ways  of  expressing  a non-terminal  are  embodied  in  a 
grammar  function.  Each  grammar  function  takes  at  least  two  arguments; 
STR,  the  list  of  words  to  be  recognized,  and  N,  the  degree  of  fuzziness 
allowed.  The  grammar  function,  in  effect,  must  determine  whether  the 
beginning  of  the  string  ST H contains  an  occurrence  of  the  corresponding 
non-terminal.  There  are  generally  two  types  of  checks  that  a grammar 
function  performs.  One  is  a check  for  the  occurrence  of  a word  or  words 
which  satisfies  certain  predicates.  This  checking  is  done  with  two 
functions  --  CHECKLST  and  CHECKSTR.  CHECKLST  looks  for  a word  in  the 
string  matching  any  of  a list  of  words.  CHECKSTR  looks  for  a word  in  the 
string  satisfying  an  arbitrary  predicate.  It  is  through  these  functions 
that  the  parser  implements  its  fuzziness.  For  example,  if  CHECKSTR  is 
called  with  the  string  "resistor  R9"  and  a predicate  which  determines  if  a 
word  is  the  name  of  a part  (e.g.  "R9"),  CHECKSTR  will  succeed  by  skipping 

the  word  "resistor",  which  in  this  phrase,  is  a noise  word. 

The  other  usual  type  of  operation  performed  by  the  grammar  functions 
is  to  check  for  the  occurrence  of  other  non-terminals.  This  is  done  by 
calling  the  proper  function  (grammar  rule)  and  passing  it  the  correct 
position  in  the  input  string. 

If  a grammar  rule  is  successful,  the  function  passes  back  two  pieces 
ol  information.  First.,  it  returns  some  indication  of  how  much  of  the  input 
string  is  accepted  (i.e.  where  it  stopped).  The  convention  adopted  is 
that  the  grammar  rule  returns  as  its  value  a pointer  to  the  last  word  in 
the  string  accepted  by  the  rule.  Second,  the  function  passes  back  a 
structural  description  of  the  phrase  that  was  parsed.  This , structure  is 
passed  back  in  the  free  variable  RESULT  (analogous  to  an  ATN's  upon 

return  from  a PUSH. 


Listed  below  is  the  grammar  rule  for  the  concept  of  a junction  of  a 
transistor.  This  rule  accepts  phrases  such  as  "base  emitter  junction  of 
05",  "BE  of  the  current  limiting  transistor",  or  "collector  emitter 
junction" . 

( < JUNCTIONS 
[LAMBDA  (STR  N) 

(PROG  ( TS1  hi) 

(RETURN 

(AND 


(*  COMMENT  A) 


[OR  (AND 
(AND 


( SETQ  1S1  ( < JUNCTION/ TYPE > STR  N)) 

( SETQ  R 1 RESULT)) 

(SETQ  T SI  ( <TRANSISTOR/TERM/TYPE>  STR  N)) 
SETQ  R 1 RESULT) 

L SETQ  TS1 

( <TRANSISTOR/TERM/TYPE> 

( CDR  (CHECKLST  ( CDR  TS1 ) 

(QUOTE  (AND  TO] 

(SETQ  R 1 ( JUNCTION-OF- TERMS  HI  RESULT] 


(*  COMMENT  B) 


(CO  HD 

([SETQ  STR  (<TRANSISTOH/SPEC> 

(CDR  (GOBBLE  (GOBbLE  TS 1 (QUOTE 
((jUOTE  (OF)) 

(SETQ  RESULT  (LIST  HI  RESULT)) 

STR) 


(JUNCTION) ) ) 
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SETw  RESULT  (LIST  hi  (LIST  (QUOTE  PREF) 

(QUOTE  (TRANSISTOR] 


1S1  J) ) 


COMMENT  A: 

The  first  thing  that  is  looked  for  is  either  a < junct  ion/type>  (BE,  emitter 
collector,  etc.)  or  two  <t ransist or/terminal/t ype>s  (base,  emitter  or 
collector)  separated  by  the  words  "and"  or  "to".  If  two  terminals  ar“ 
found,  the  function  JUNCTION-OF-I ERMS  is  called  to  determine  the  proper 
junction.  In  either  case,  the  place  where  the  successful  subsidiary  rule 
left  off  is  saved  in  TS1  and  the  meaning  of  the  accepted  phrase  is  saved  in 
R 1 . 

COMMENT  B: 

The  next  thing  needed  for  a junction  is  a transistor  <TRANSISTOR/SPEC> . 
<TRANSISTOH/SPEC>  looks  for  an  occurrence  of  a transistor,  e.g.  "Q5"  or 
"current  limiting  transistor".  GOBBLE  is  a function  for  skipping 
relational  words  when  they  are  not  used  to  restrict  the  remaining  part  of* 
the  phrase.  If  a transistor  is  not  found,  a deletion  is  hypothesized  and  a 
call  to  PREF  is  constructed.  If  the  transistor  has  been  pronominal ized  as 
in  "the  base  emitter  of  it",  <TRANSISTOR/SPEC>  would  recognize  "it".  In 
eitner  case  the  semantics  of  the  recognized  phrase  (something  like  ( EB  Q5 ) ) 
is  put  into  RESULT  and  a pointer  to  the  last  recognized  word  is  returned  as 
the  value  of  <JUNCTION> . 

There  are  approximately  80  grammar  rules  in  SOPHIE’s  grammar. 
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Appendix  C 

Sample  Parses  and  Parse  Times  for  the  LISP  Implementat  ion 

This  appendix  presents  some  examples  of  sentences  nandled  by  the 
natural  language  processor  together  with  their  parse'  times.  Under  each 
statement,  the  semantic  interpretation  returned  by  the  parser  is  given. 
The  semantic  interpretation  is  a function  call  which  when  evaluated 
performs  the  processing  required  by  the  statement.  Parse  times  are  given 
in  milliseconds. 

Insert  a fault  . 

( INSERT FAULT  NIL) 

35  ms 

What  is  the  output  voltage? 

(MEASURE  VOLTAGE  NIL  OUTPUT) 

40  ms 

What  is  the  voltage  between  the  current  limiting  transistor 
and  t ht  constant  current  source? 

(MEASURE  VOLTAGE  ( N0DE/BE1WEEN 

(FINDPART  CURRENT/LIMI1 EH  TRANSISTOR) 

CURRENT /SOURCE ) ) 

335  ms 

What  is  the  voltage  between  there  and  the  base  of  05? 

(MEASURE  VOLTAGE  [PREF  (NODE  TERMINAL))  (BASE  06)) 

»0  ms 


05? 

(REFERENCE  ((TRANSISTOR)  05)) 

60  ms 

Could  the  problem  be  that  Q5  is  bad? 

(TESTFAULT  05  BAD) 

100  ms 

Could  it  be  shorted? 

( TtSTR  AULT  (PREF  (PART  JUNCTION  TERMINAL))  SHORT) 

75  ms 

If  Rb  were  30k  what  would  the  output  voltage  be? 

I FT  HEN  ■.  Rd  30000.0  VALUE) 

(MEASURE  VOLTAGE  NIL  OUTPUT)) 

220  ms 

It  C2  were  leaky  what  would  the  voltage  across  it  be? 
( 1R THEN  ( C2  LEAKY) 

(MEASURE  VOLTAGE  (PREF  (PART  JUNCTION))) 

120  ms 


What  is  the  output  voltage  when  the  voltage  control  is  set  to  .5? 
( RESETCONTRuL  ( 5TQ  VC  .5) 

(MEASURE  VOLTAGE  NIL  OUTPUT)) 

85  ms 

What  is  it  with  it  set  at  .6? 

( RESET CONTROL  ( STQ  (PREF  (POT  LOAD  SWITCH))  .6) 

( REFERENCE  NIL) ) 

110ms 


If  it  is  set  to  .9? 

( RESETC0NTRCL  (STC  (PREF 
( REFERENCE 

135  ms 


(POT  LOAD 
NIL)  ) 


SWITCH)  ) 


.9) 


67 


fchat  is  t fi»  current  thru  the  cc  when  the  ve  ir.  net  to  1.0? 

( HR:  (-'Ii  i NTReL  (STQ  VC  1 .0) 

(MEASURE  CURRENT  CC)) 

190  ms 

II  ,6  has  an  open  emitter  and  a shorted  base  collector 
junction,  what  happens  to  the  voltage  between  its  base  and 
the  junction  of  the  voltage  limiting  section  and  the  voltage 
reference  source? 

( IF  1 HEN 

( MUL1  ((EMITTER  Qb ) OPEN) 

( ( bC  (FREE  (TRANSISTOR)))  SHORT)) 

(MEASURE  VOLTAGE 

(EASE  (PREF  (TRANSISTOR))) 

(NODE/BETWEEN  VOLTAGE/LIMITER  REFERENCE/VOLTAGE ) ) ) 

400  ms. 


- 


J 
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Appendix  D 

Examples  of  ATN  Compilation 

This  appendix  presents  a simple  augmented  transition  network  grammar 
along  with  two  different  programs  compiled  from  it  and  a trace  of  the  first 
program  parsing  a sentence.  The  ATN  grammar  was  taken  from  (Woods  1970), 
Both  compiled  versions  of  the  grammar  assume  a depth-first,  search  strategy 
and  use  conf igurat ions  which  include  the  state,  node,  stack,  registers, 
features  and  hold  list. 

The  first  program  does  not  support  lexical  ambiguity  (neither  that 
caused  by  compound  rules  nor  that  caused  by  multiple  interpretations  under 
the  same  category).  In  addition,  it  neither  keeps  a well-formed  substring 
table,  tests  for  input  before  pushing  nor  returns  features  with  popped 
constituents.  The  second  program,  on  the  other  hand,  has  all  of  these 
capabilities.  The  listing  of  the  second  program  also  includes  tracing 
functions  the  compiler  includes  in  the  program  to  allow  the  user  to  follow 
its  operation.  Both  programs  are  given  in  CLISP  (Teitelman  1974). 

The  final  section  of  the  appendix  contains  a trace  of  the  first 
program  (using  a version  which  did  include  tracing  functions)  discovering 
all  possible  parses  of  the  sentence  "John  was  believed  to  have  been  shot  by 
Fred".  Shown  in  the  trace  are  all  of  the  arc  transitions  taken  by  the 
parser  together  with  all  register  setting  operations.  (The  reader  may 
compare  this  with  the  analysis  of  this  sentence  given  in  (Woods  1970).) 
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The  grammar 


(CAT  AUX  T 

(SETR  V •) 

(SETR  TNS  (LIST  (GETF  * TENSE))) 

(SETRQ  TYPE  Q) 

(TO  Q 1 / ) ) 

(PUSH  NP/  T 

(SETR  SUBJ  *) 

SETRQ  TYPE  DCL ) 

(TO  Q2/ ) ) ) 

(Q  1 / 

(PUSH  NP/  T 

(SETR  SUBJ  *) 

(TO  Q3/)  ) ) 

(Q2/ 

(CAT  V T 

SETR  V *) 

(SETR  TNS  (LIST  (GETF  * TENSE))) 

(TO  Q 3 / ) ) ) 

(Q3/ 

(CAT  V (AND  (GETF  * PPRT) 

(EQ  (GETR  V) 

(QUOTE  BE))) 

(HOLD  (GETR  SUBJ) ) 

SETR  SUBJ  (BUILDQ  ( NP  (PRO  SOMEONE)))) 
(SETR  AGFLAG  T) 

(SETR  V *) 

(TO  Q3/)  ) 

(CAT  V (AND  (GETF  • PPRT) 

(EQ  (GETR  V) 

(QUOTE  HAVE))) 

(SETR  TNS  (APPEND  (GETR  TNS) 

(QUOTE  (PERFECT)))) 

(SETR  V •) 

(TO  Q3/)) 

(PUSH  NP/  (TRANS  (GETR  V) ) 

(SETR  OBJ  *) 

(TO  Q4/)) 

(VIR  NP  (TRANS  (GETR  V) ) 

(SETR  OBJ  •) 

TO  Q4/)) 

(POP  (BUILDQ  (S  ♦ ♦ (TNS  +)  (VP  (V  +))) 

TYPE  SUBJ  TNS  V) 

(INTRANS  (GETR  V)))) 
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I 

I 

I 

I 

I 

I 

f 

r 


(0*4/ 

(WRD  BY  (GETR  AGFLAG) 

(SETR  AGFLAG  NIL) 

(TO  Q7/)) 

(WRD  TO  (S-TRANS  (GETR  V)) 

TO  Q5/)) 

(POP  ( BUILDQ  (S  + + ( TNS  + ) (VP  (V  +)  +)) 
TYPE  SUBJ  TNS  V OBJ) 


(PUSH  VP/  T 

(SENDR  SUBJ  (GETR  OBJ) ) 

(SENDR  TNS  (GETR  TNS)) 

( SENDRQ  TYPE  DCL ) 

(06/ 

(WRD  BY  (GETR  AGFLAG) 

(SETR  AGFLAG  NIL) 

(TO  Q7/) ) 

(POP  (BUILDQ  (S  + + (TNS  +)  (VP  (V  >)  ♦)) 
TYPE  SUBJ  TNS  V OBJ) 

T)) 

(07/ 

(PUSH  NP/  T 

(SETR  SUBJ  •) 

(TO  06/)  )) 

(VP/ 

(CAT  V ( GETF  * UNTENSED) 

SETR  V •) 

(TO  Q3/)  )) 

(NP/ 

(CAT  DET  T 

(SETR  DET  *) 

(TO  NP/D) 

(CAT  NPR  T 

(SETR  NPR  •) 

(TO  NP/3))) 

(NP/1 

(CAT  ADJ  T 

( ADDL  ADJS  •) 

(TO  NP/1  ) ) 

(CAT  N T 

(SETR  N *) 

(TO  NP/2  ) ) ) 

( NP/2 

(POP  (BUILDQ  (NP  (DET  ♦)  (ADJ  +)  (N  +)) 
DET  ADJS  N) 

T)) 

(NP/3 

(POP  (BUILDQ  (NP  (NPR  +)  ) 

NPR) 

T)) 

) 
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Version  I 


(PARSER 

( LAMBDA  ( ACF  ) 

(PROG  (STATE  NODE  STACK  REGS  HOLD  » LEX) 

The  current  status  of  the  machine  is  kept  in  five  global 
variables;  (1)  STATE,  the  state/arc  in  the  grammar,  (2) 
NODE,  the  pointer  into  the  input,  (3)  REGS,  the  list  of 
register  name-value  pairs,  (4)  oTACK,  the  return  stack,  and 
(5)  HOLD,  the  hold  list.  Putting  the  machine  into  a given 
configuration  involves  assigning  values  of  these  five 
variables. 

SPREAD-ACF 

( STATE* (CF. STATE  ACF)) 

( REGSi.TCF.REGS  ACF)) 

( STACK  •-(  CF.  STACK  ACF)) 

( HOLDiTCF . HOLD  ACF)) 

( NODEe( CF . NODE  ACF)) 

(LEXiTEDGE.WORD  (FIRST. EDGE  NODE))) 

BRANCH  dispatches  control  to  the  label  specified  by  STATE. 
This  is  the  method  of  executing  an  arc. 


EVALARC 

(BRANCH  STATE  SUCCESS  DETOUR  S/  S/-2  S/-2-PUSH  Q 1 / 

Q 1 /- 1 -PUSH  Q2/  Q3/  Q3/-2  03/-3  03/-^  03/-5 
Q3/-3-PUSH  Q4/  Q k/-2  QH/-3  05/  Q5/-1-PUSH  06/ 

06/-2  07/  07 /- 1 -PUSH  VP/  NP/ 

NP/-2  NP/1  NP/ 1-2  NP/2  NP/3) 

SUCCESS  checks  to  make  sure  all  of  the  input  has  been 
processed.  If  not  it  detours. 


success 

( if  ( EMPTYP . NODE  NODE) 
then  (RETURN  #) 
else  (GO  DETOUR)) 

DETOUR  decides  which  alternative  to  try  next.  In  this  case 
the  alternatives  list  is  a stack. 


DETOUR 

(if  ALTS 

then  ACF«-(  ALTS . FIRST ) 

( alTs.butfirst 

(GO  SPREAD-ACF) 
else  (RETURN  (FAILURE))) 

This  is  the  beginning  of  the  code  which  is  compiled  from  the 
arcs.  The  first  arc  of  each  state  has  a label  which  is  the 

same  as  the  state  name  in  the  ATN.  The  other  arcs  have  a 
label  which  is  the  state  name  followed  by  and  the  arc 
number.  Labels  which  end  in  "-PUSH"  indicate  the  actions 
and  termination  action  of  PUSH  arcs. 

S/  (if  (ARCCAT  AUX) 

then  (ALTARC  S/-2) 

(SETR  V *) 

(SETR  'INS  < ( GETF  * 'TENSE) >) 

(SETRO  TYPE  0) 

DOTO  01/) 

(GO  01/)) 
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S/-2  ( DOPUSH  NP/  S/-2-PUSH) 

(G 0 NP/) 

S/-2-PUSH 

(SETH  'SUBJ  *) 

(SETRQ  TYPE  DCL ) 

(DOPTO  02/ ) 

(GO  02/) 

01/  DOPUSH  NP/  0 1 /- 1 -PUSH ) 

(GO  NP/) 

0 1 /- 1 -PUSH 

(SETH  'SUBJ  *) 

(DOPTO  03/) 

(GO  03/) 

02/  (if  (AHCCAT  V) 

then  (SETR  'V  *) 

(SETH  'TNS  <(GETF  * 'TENSE)>) 

(DOTO  Q 3/ ) 

(GO  Q3/)) 

(GO  DETOUR) 

03/  (While  (AHCCAT  V)  and  (GETF  * 'PPRT) 

and  (GETR  V)='BE 
do  (ALTARC  Q3/-2) 

HOLD  (GETR  SUBJ)) 

(SETR  SUBJ  ( BUILDQ  ( NP  (PRO  SOMEONE)))) 
(SETR  AGFLAG  T) 

(SETR  'V  •) 

(DOTO  03/ ) ) 

Q3/-2 

(if  (ARCCAT  V)  and  (GETF  » 'PPRT) 
and  (GETR  V)='HAVE 
then  (ALTARQ  Q3/-3) 

SETR  TNS  < ! (GETR  TNS)  ! ' ( PERFECT ) > ) 
SETR  V *) 

DOTO  03/) 

(GO  03/)) 

Q3/-3 

(if  (TRANS  (GETR  V)) 
then  (ALTARC  Q3/-M ) 

(DOPUSH  NP/  Q3/-3-PUSH) 

(GO  NP/)) 

Q3/-R 

(if  (HOLDSCAN  HOLD  'NP  '(TRANS  (GETR  V))) 
then  (ALTARC  Q3/-5) 

(PREVIRACTS) 

(SETR  OBJ  *) 

(DOVIRTO  QV) 

(GO  QH/)) 

Q3/-5 

( i f ( INTRANS  (GETR  V) ) 

then  (DOPOP  (BUILDQ  (S  + +(TNS  *)  (VP  (V  +))) 

TYPE  SUBJ  TNS  V)) 

(GO  EVALARC)) 

(GO  DETOUR) 

Q3/-3-PUSH 

(SETR  OBJ  •) 

(DOPTO  04/) 

(GO  04/) 

04/  (if  ( ARCWRD  BY)  and  (GETR  AGFLAG) 
then  (ALTARC  Q4/-2) 

SETR  AGFLAG  NIL) 

DOTO  07/) 

(GO  07/)) 
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(if  (ARCWRD  TO)  and  (S-TRANS  (GETR  V)) 
t hen  ( ALTARC  Q4/-3) 


Q4/-3 

(DOPOP  ( BU1LDQ  (S  + +(TNS  + ) (VP  (V  + )+)) 
TYPE  SUBJ  TNS  V OBJ)) 

(GO  EVALARC) 

05/  ( SENDR  SUBJ  (GETR  OBJ)) 

V SEN DR  'TNS  (GETR  TNS)) 

( SENDRQ  TYPE  DCL ) 

(DOPUSH  VP/  Q5 / — 1 -PUSH ) 

(SREGS  NIL) 

(GO  VP/) 

05/- 1 -PUSH 

(SETR  'OBJ  *) 

DOPTO  06/) 

(GO  06/) 

06/  ( i f (ARCWRD  BY)  and  (GETR  AGFLAG) 
then  (ALTARC  06/-2) 

SETR  AGFLAG  NIL) 

(DOTO  07/) 

(GO  07/)) 

Q6/-2 

(DOPOP  ( BUILDQ  (S  + +(TNS  +) 

(VP  (V  +)♦)) 

TYPE  SUBJ  TNS  V OBJ)) 

(GO  EVALARC) 

07/  (DOPUSH  NP/  07 /- 1 -PUSH ) 

(GO  NP/) 

07/- 1 -PUSH 

(SETR  'SUBJ  *) 


DOTO  Q5/) 
(GO  05/ ) ) 


(DOPTO  06/) 

(GO  06/) 

VP/  (if  ( ARCCAT  V)  §nd  (GETF 


DOTO  Q3/ ) 
GO  03/ ) ) 


'UNTENSED) 


(GO  DETOUR) 

NP/  (if  (ARCCAT  DET) 

then  (ALTARC  NP/-2) 

(SETR  DET  ») 

DOTO  NP/1) 

(GO  NP/1)) 

NP/-2 

(if  (ARCCAT  NPR ) 

then  (SETR  'NPR  *) 

DOTO  NP/3) 

(GO  NP/3)) 

(GO  DETOUR) 

NP/1 (while  (ARCCAT  ADJ)  do  (ALTARg  NP/ 1-2 ) 


NP/1 -2 

(if  (ARCCAT  N) 

then  (SETR  N •) 

DOTO  NP/2) 

(GO  NP/2)) 

(GO  DETOUR) 

NP/2 (DOPOP  (BUILDQ  (NP  (DET  +) 


( ADDL  ADJS  *) 
( DOTU  NP  / 1 ) ) 


DET  ADJS  N ; ) 

(GO  EVALARC) 

NP/3 (DOPOP  (BUILDQ  ( NP  (NPR  ♦)) 
NPR)  ) 

(GO  EVALARC)))) 
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Version  II 


(PARSER 

(LAMBDA  ( ACF ) 

(PROG  (STATE  NODE  STACK  REGS  FEATS  HOLD  * LEX  SREGS 
SFEATS  FEATURES  TEMP) 


If  the  function  is  called  with  an  argument  of  'GO,  it  looks 
for  another  parse.  This  allows  the  user  to  get  out  more 
than  the  first  parse. 

(if  ACF= 'GO 

then  (GO  DETOUR)) 

The  current  status  of  the  machine  is  kept  in  five  global 
variables:  (1)  STATE,  the  state/arc  in  the  grammar,  (2) 
NODE,  the  pointer  into  the  input,  (3)  REGS,  the  list  of 
register  name-value  pairs,  (4)  STACK,  the  return  stack,  and 
(5)  HOLD,  the  hold  list.  Putting  the  machine  into  a given 
configuration  involves  assigning  values  to  these  five 
variables . 


SPREAD-ACF 

(CHANGESTATE  (CF. STATE  ACF)) 

( REGS*-(  CF  . REGS  ACF)) 

( FEATl!*-(  CF  . FEATS  ACF)) 

( STACK*- ( CF  .STACK  ACF)) 

(HOLD4.TCF.HOLD  ACF)) 

(LEX4TEDGE.WORD  (FIRST. EDGE  NODE^_(  CF  . NODE  ACF)))) 

TRACEALTSTART  is  one  of  the  tracing  functions  provided  to 
allow  the  user  to  follow  the  operations  of  the  parser.  The 
others  are  TRACEARC  and  ABORT.  None  of  these  result  in  any 
code  when  a fast  version  of  the  parser  is  produced. 

( TRACEALTSTART) 

(GO  EVALARC) 

NEXTLEX 


If  the  current  node  has  more  than  one  lexical  interpretation 
(BUTFIRST.EDGE) , the  code  sets  NODE  to  try  the  next  one. 


(if 


(BUTFIRST.EDGE  NODE) 

then  LEXi( EDGE. WORD  (FIRST. EDGE 

N0DE«-(  BUTFIRST . EDGE 
NODE) ) ) 

(GO  EVALARC)) 


BRANCH  dispatches  control  to  the  label  specified  by  STATE. 

EVALARC 

(BRANCH  STATE  SUCCESS  DETOUR  S/  S/-1-C0NT  S/-2 
S/- 1 -CAT  S/-2-PUSH  Q 1 / Q1/-1-PUSH  Q2/ 

Q2/-1-C0NT  Q2/-1-CAT  Q3/  Q3/-1-C0NT 

Q3/-2  Q3/-2-C0NT  03/-3  Q3/-4  Q3/-5 

Q3/-1-CAT  Q3/-2-CAT  Q3/-3-PUSH  04/  Q4/-2  Q4/-3 

05/  05/- 1 -PUSH  06/  Q6/-2  07/  Q7/-1-PUSH 

VP/  VP/-1-C0NT  VP/-1-CAT  NP/  NP/- 1-0  NT 

NP/-2  NP/-2-C0NT  NP/ - 1 -CAT  NP/-2-CAT 

NP/ 1 NP/ 1 - 1 -CONT  NP/1-2  NP/1-2-C0N’ 

NP/ 1 - 1 -CAT  NP/ 1 -2-CAT  NP/2  NP/3> 

SUCCESS 

(RETURN  NODE) 
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DETOUR  chooses  an  alternative  from  the  ALTS  list.  In  this 
version  the  ALTS  list  is  a stack.  The  detouring  mechanism 
could  be  changed  by  redefining  ALTS. FIRST  and  ALTS . BUTFIRST . 
If  there  are  no  more  alternatives,  the  first  alternative 
from  the  list  of  SUSPENDED  alts  is  taken.  The  suspended 
alternatives  are  maintained  in  order  by  weight. 


ABORT 

(ABORT)  ABORT  is  a tracing  function. 

DETOUR 

(if  ALTS 

t hen  ACF«-(  ALTS  .FIRST) 

( ALTS. BUTFIRST 
(GO  SPREAD-ACF ) 
elseif  SUSPENDEDALTS 
then  ACF+( SUSPEND. POP) 

(GO  SPREAD-ACF) 
else  (RETURN  (FAILURE))) 

S/  (if  (ARCCAT  AUX) 
else  (GO  S/-2)) 

(ALTARC  S/-2 ) 

(TRACEARC  CAT  AUX  S/-1) 

S/-1-C0NT 

(ALTCAT  S/-1-CAT) 

(SETR  V *) 

SETR  'TNS  < ( GETF  * 'TENSE)>) 

( SETRQ  TYPE  Q) 

(DOTO  Q 1 / ) 

(GO  01/) 

S/-2( i f (STRINGLEFTP) 

then  ( NEXTLEXALT  S/) 

(TRACEARC  PUSH  NIL  S/-2) 

(DOPUSH  NP/  S/-2-PUSH ) 

(GO  DETOUR)) 

(CHANGESTATEO  S/) 

(GO  NEXTLEX ) 

S/- l-TAT 

(ARCCAT  AUX) 

(TRACEARC  ALTCAT  AUX  S/-1) 

(GO  S/ - 1 -CONT ) 

S/-2-PUSH 

(SPREAD/WFS) 

SETR  'SUBJ  *) 

(SETRQ  TYPE  DC L) 

(DOPTO  02/ ) 

GO  02/) 

Q1/  (if  (STRINGLEFTP) 

then  (NEXTLEXALT  01/) 

(TRACEARC  PUSH  NIL  Q1/-1  ) 

(DOPUSH  NP/  0 1 /- 1 -PUSH ) 

(GO  DETOUR)) 

(CHANGESTATEO  01/) 

(GO  NEXTLEX) 

Q 1 /- 1 -PUSH 

(SPREAD/WFS) 

(SETR  SUBJ  •) 

DOPTO  03/) 

GO  03/) 

02/  (if  (ARCCAT  V) 

else  (CHANGESTATEO  Q2/) 

(GO  NEXTLEX)) 

(NEXTLEXALT  02/) 

(TRACEARC  CAT  V Q2/-1 ) 
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i 
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Q2/-1-C0NT 

(ALTCAT  Q2/-1-CAT) 

(SETH  V *) 

(SETH  INS  <(GETF  * TENSE )>) 

(DOTO  U3/) 

(GO  Q3/) 

02/ - 1 -CAT 

( AhCCAT  V) 

( THACEAHC  ALTCAT  V Q2/-1) 

(GO  Q2/-1-C0NT) 

03/  (if  (ARCCAT  V) 

else  (GO  Q3/-2)) 

(ALTARC  03/-2) 

(TRACEARC  CAT  V 03/- 1 ) 

Q3/-1-C0NT 

(ALTCAT  Q3/-1-CAT) 

(if  ~( (GETF  * PPRT)  and  (GETR  V)=  BE) 
then  (GO  ABORT)) 

(HOLD  (GETR  SUBJ)) 

SETH  SUBJ  ( BUILDO  ( NP  (PRO  SOMEONE)))) 

(SETH  'AGFLAG  T) 

(SETR  'V  *) 

(DOTO  03/) 

(GO  Q3/) 

Q3/-2 

(if  (ARCCAT  V) 
else  (GO  Q3/-3)) 

(ALTARC  03/-3) 

(TRACEARC  CAT  V Q3/-2) 

Q3/-2-CONT 

(ALTCAT  Q3/-2-CAT) 

(if  ~((GETF  * PPRT)  and  (GETR  V)=  HAVE) 
then  (GO  ABORT)) 

(SETR  'TNS  < ! (GETR  TNS ) ! ' 

(PERFECT)/) 

(SETH  V *) 

(DOTO  03/ ) 

(GO  Q3/) 

03/-3 

(if  (STRINGLEFTP)  and  (TRANS  (GETR  V)) 

* hen  ( ALTARC  03/-^ ) 

(TRACEARC  PUSH  NIL  Q3/-3) 

(DOPUSH  NP/  03/-3-PUSH) 

(GO  DETOUR)) 

Q3/-4  . 

(ii  TEMP  (HOLDSCAN  HOLD  NP  (TRANS  (GETR  V))) 
’.hen  (ALTARC  Q3/-5) 

(TRACEARC  VIR  NP  Q3/-^ ) 

(PREVIRACTS) 

(SETR  OBJ  *) 

(DOVIRTO  Q4/) 

(GO  04/ ) ) 

Q3/-5 

(if  (INTRANS  (GETR  V) ) 
then  ( NEXTLEXALT  03/ ) 

(TRACEARC  POP  NIL  Q3/-5) 

(DOPOP  (BUILDO  (S  + + ( TNS  +) 

(VP  (V  +>)) 

TYPE  SUBJ  TNS  V) 

(GETR  POPFEATS)) 

(GO  DETOUR)) 

(CHANGESTATEQ  03/) 

(GO  NEXTLEX) 
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Q3/-1-CAT 

( ARPCAT  V ) 

( TRACEARC  ALTCAT  V Q3/-1 ) 

(GO  Q3/-1-CONT) 

Q3/-2-CAT 

(ARCCAT  V) 

TRACEARC  ALTCAT  V Q3/-2) 

(GO  Q3/-2-CONT) 

03/-3-PUSH 

(SPREAD/WFS) 

SETR  OBJ  *) 

DOPTO  Q4/) 

(GO  04/ ) 

Q4/  (if  (ARCWRD  BY)  and  (GETR  AGFLAG) 
then  (ALTARC  Q4/-2) 

(TRACEARC  WRD  BY  04/- 1 ) 

(SETH  AGFLAG  NIL) 

DOTO  07/ ) 

(GO  Q7/)) 

Q4/-2 

(if  (ARCWRD  TO)  and  (S-TRANS  (GETR  V)) 
then  (ALTARC  Q4/-3) 

(TRACEARC  WRD  TO  Q4/-2) 

DOTO  Q5/ ) 

(GO  Q5/ ) ) 

Q4/-3 

(NEXTLEXALT  Q4/) 

(TRACEARC  POP  NIL  Q4/-3) 

(DOPOP  ( BUILDQ  (S  + +(TNS  +) 

(VP  (V  +)+)) 

TYPE  SUBJ  TNS  V OBJ) 
(GETR  POPFEATS)) 

(GO  DETOUR) 

05/  (if  (STRINGLEFTP) 

then  (NEXTLEXALT  Q5/) 

' TRACEARC  PUSH  NIL  Q5/-1) 

(SENDR  SUBJ  (GETR  OBJ)) 

( SENDR  'TNS  (GETR  TNS)) 

( SENDRQ  TYPE  DCL ) 

( DOPUSH  VP/  Q5/-  1-PUSH) 

SREGSjrNIL 

SFEAT5*-NIL 

(GO  DETOUR)) 

(CHANGESTATEO  05/) 

(GO  NEXTLEX ) 

Q5/-1-PUSH 

(SPREAD/WFS) 

SETR  OBJ  *) 

(DOPTO  Q6/ ) 

GO  06/) 

06/  (if  (ARCWRD  BY)  and  (GETR  AGFLAG) 
then  (ALTARC  06/-2) 

(TRACEARC  WRD  BY  Q6/-1  ) 

(SETR  AGFLAG  NIL) 

DOTO  07/) 

(GO  Q7/)) 

Q6/-2 

(NEXTLEXALT  Q6/) 

U-  (TRACEARC  POP  NIL  Q6/-2) 

(DOPOP  (BUILDQ  (S  + +(TNS  +) 

(VP  (V  +)+)) 

TYPE  SUBJ  TNS  V OBJ) 

, (GETR  POPFEATS)) 

(GO  DETOUR) 
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07/  (if  ( STR I NGLEFTP ) 

t hen  ( NEXTLEXALT  Q7/) 

(TRACEARC  PUSH  NIL  07/- 1 
(DOPUSH  NP/  07/- 1 -PUSH) 
(GO  DETOUR) ) 

( CHANGES) ATEO  07/) 

(GO  NEXTLEX ) 

07/- 1 -PUSH 

SPREAD/WFS) 

(SETH  'SUBJ  *) 


VP/ 


( 

VP/- 1 

( 

(: 


DOPTO  06/) 

GO  U6/ ) 
if  (ARCCAT  V) 
else  (CHANGESTATEO  VP/) 
(GO  NEXTLEX)) 
NEXTLEXALT  VP/) 

TRACEARC  CAT  V VP/-1  ) 
-CONT 

ALTCAT  VP/- 1 -CAT ) 


(: 


UNTENSKD) 
ABORT) ) 


f ~ ( GETF 
then  (GO 
SETR  'V  *) 

(DOTO  03/) 

(GO  Q3/) 

VP/- 1 -CAT 

(ARCCAT  V) 

TRACEARC  ALTCAT  V VP/-1) 
(GO  VP/- 1 -CONT ) 
if  (ARCCAT  DET) 
else  (GO  NP/-2)) 

ALTARC  NP/-2 ) 

TRACEARC  CAT  DET  NP/-1 ) 
-CONT 

ALTCAT  NP/- 1 -CAT ) 

SETR  DET  *) 

DOTO  NP/1 ) 

GO  NP/1 ) 


NP/ 

(, 

(' 

NP/-  1 

( 


(< 


NP/-2 

(if  ( ARCCA1  IJPR) 

else  (CHANGESTATEO  NP/) 
(GO  NEXTLEX)  ) 
(NEXTLEXALT  NP/) 

(TRACEARC  CAT  NPR  NP/-2) 
NP/-2-C0NT 

(ALTCA1  NP/-2-CAT ) 

(3 ETR  NPR  *) 

DOTO  NP/3) 

(GO  NP/3) 

NP/- 1 -CAT 

(ARCCAT  DET) 

(TRACEARC  ALTCAT  DET  NP/-1) 
(GO  NP/-1-C0NT) 

NP/-2-CAT 

(ARCCAT  NPR) 

(TRACEARC  ALTCAT  NPR  NP/-2) 
(GO  NP/-2-C0NT ) 

NP/Kif  (ARCCAT  ADJ) 

else  (GO  NP/ 1-2) ) 

(ALTARC  NP/1-2) 

(TRACEARC  CAT  ADJ  NP/ 1-1  ) 
NP/ 1-1 -CONT 

(ALTCAT  NP/ 1 - 1 -CAT ) 

( ADDL  ADJS  *) 

(DOTO  NP/1) 

(GO  NP/1 ) 


) 


f 
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NP/  1-2 

(if  (ARCCAT  N) 

else  (CHANGESTATEU  NP/1) 
(GO  NEXTLEX ) ) 

( NEXTLEXALT  NP/1 ) 

(TRACEARC  CAT  N NP/ 1-2 ) 

NP/1 -2-C0NT 

(ALTCAT  NP/ 1 -2-CAT ) 

(SETH  N *) 

(DOTO  NP/2) 

(GO  NP/2) 

NP/ 1 - 1 -CAT 

(ARCCAT  ADJ) 

TRACEARC  ALTCAT  ADJ  NP/ 1-1 ) 
(GO  NP/1-1-CONT) 

NP/1-2-CAT 

(ARCCAT  N) 

TRACEARC  ALTCAT  N NP/1-2) 
(GO  NP/1-2-CONT) 

NP/2 ( NEXTLEXALT  NP/2) 

(TRACEARC  POP  NIL  NP/2-1) 
(DOPOP  (BUILDQ  (NP  (DET  +) 
(ADJ  ■») 

(N  ♦)) 

DET  ADJS  N) 
(GETR  POPFEATS)) 

(GO  DETOUR) 

NP/3  NEXTLEXALT  NP/3) 

(TRACEARC  POP  NIL  NP/3-1) 
(DOPOP  (BUILDQ  (NP  ( NPR  +)) 
NPR ) 

(GETR  POPFEATS)) 

(GO  DETOUR)))) 
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Trace  of  Version  1 Parsing  a Sentence 

PARSE( (JOHN  WAS  BELIEVED  TO  HAVE  BEEN  SHOT  BY  FRED)) 

Starting  ilternative  0 
At  arc  3/ 

Node  = (((JOHN  NPR  (&))  ((WAS  V 4 AUX  St)  (4  St)))) 

The  sentence  is  converted  into  a chart  format  . The  chart 
contains  information  about  the  possible  parts  of  speech  of  each 
word.  Notice  that  "was"  can  be  either  a verb  (V)  or  an  auxiliary 
verb  (AUX),  (An  is  used  to  indicate  a further  structure.) 

Taking  PUSH  arc  3/ -2 

The  trace  indicates  the  arc  type  and  its  location  in  the  grammar. 
No  alternative  is  stored  because  S/-2  is  the  la.,t  arc  in  the 
state  S/  and  there  are  no  lexical  alternatives. 

PUSHing  for  NP/ 

Taking  CAT  NPR  arc  NP/-2 
Sett ing  NPR  to  JOHN 

The  t race  also  indicates  where  registers  get  set. 


Entering  state  NP/3 

Node  z (((WAS  V (4)  AUX  (&))  ((BELIEVED  V 4)  (4  4)))) 
Taking  POP  arc  NP/3-1 
trying  to  POP 

(Continuing  arc  S/-2-PUSH) 

Setting  SUBJ  to  (NP  (NPR  JOHN)) 

Setting  TYPE  to  DCL 

Entering  state  02/ 

Node  = (((WAS  V (4)  AUX  (4))  ((BELIEVED  V 4)  (44)))) 
taking  CAT  V arc  Q2/-1 
Setting  V to  BE 
Setting  TNS  to  (PAST) 


Enter i ng  stat  e 03/ 

Node  = f ( ( bELIEVED  V (&))  ((TO  PREP  4)  (4  4)))) 

the  alternative  configuration  to  try  the  second  arc  leaving  ,3/ 
(03/2)  is  created  and  saved  after  the  test,  has  succee'  ?d  on  the 
first  ire  but  before  the  arc  is  taken.  This  is  alt  2 because 
configuration  1 was  created  during  the  earlier  PUSH  arc  (i.e. 
the  number  is  a conf iguration  number). 

Storing  alt  2 for  arc  Q3/-2 
'iuKing  CAT  V arc  w3/-1 
HOLDing  (NP  (NPR  JOHN)) 

Setting  SUBJ  to  ( NP  (PRO  SOMEONE)) 

Setting  AGFLAG  to  T 
Setting  V to  BELIEVE 

Entering  state  w3/ 

Node  r f { ( TO  PREP  (4))  ((HAVE  V 4)  (4  4)))) 

Storing  alt  3 for  arc  Q3/-A 
Taking  PUSH  arc  U3/-3 
PUSH ing  for  NP/ 

BLOCKED 

Starting  alternative  3 
At  arc  03/-^ 
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Node  = (((TO  PREP  (&))  ( (HAVE  V 4)  (4  4)))) 
Storing  alt  5 for  arc  Q3/-5 
Taking  VIR  NP  arc  Q3/-4 
( NP  (NPR  JOHN))  removed  Prom  HOLD  list 
Setting  OBJ  to  (NP  (NPR  JOHN)) 

Entering  state  Qi|/ 

Node  = (((TO  PREP  (&))  ((HAVE  V 4)  (&  &)))) 
Storing  alt  6 for  arc  Q4/-3 
Taking  WRD  TO  arc  Q4/-2 

Entering  state  Q5/ 

Node  = (((HAVE  V (&))  ((BEEN  V 4)  (4  4)))) 
Taking  PUSH  arc  Q5/-1 

SENDing  SUBJ  value  of  (NP  (NPR  JOHN)) 
SENDing  TNS  value  of  (PAST) 

SENDing  TYPE  value  of  DCL 
PUSHing  for  VP/ 

Taking  CAT  V arc  VP/-1 
Setting  V to  HAVE 

Entering  state  03/ 

Node  = (((BEEN  V (4))  ((SHOT  V 4)  (4  4)))) 
Storing  alt  8 for  arc  Q3/-3 
Taking  CAT  V arc  Q3/-2 

Setting  TNS  to  (PAST  PERFECT) 

Setting  V to  BE 

Entering  state  03/ 

Node  = (((SHOT  V (4))  ((BY  PREP  4)  (4  NIL)))) 
Storing  alt  9 for  arc  Q3/-2 
Taking  CAT  V arc  Q3/-1 
HOLDing  (NP  (NPR  JOHN)) 

Setting  SUBJ  to  (NP  (PRO  SOMEONE)) 

Setting  AGFLAG  to  T 
Set  t ing  V to  SHOOT 

Ent er j ng  st  ate  03/ 

Node  = (((BY  PREP  (4))  ((FRED  NPR  4)  NIL))) 
Storing  alt  10  for  arc  Q3/-4 
Taking  PUSH  arc  03/-3 
PUSHing  for  NP/ 

BLOCKED 


Starting  alternative  10 
At  arc  Q3/-9 

Node  r ''  ( ( BY  PREP  (4))  ((FRED  NPR  4)  NIL))) 
Storing  alt  12  for  arc  Q3/-5 
Taking  VIR  NP  arc  Q3/-4 
( NP  (NPR  JOHN))  removed  from  HOLD  list 
Setting  OBJ  to  (NP  (NPR  JOHN)) 

Entering  state  04/ 

Node  r (((BY  PREP  (4))  ((FRED  NPR  4)  NIL))) 
Storing  alt  13  for  arc  Q4/-2 
Taking  WRD  BY  arc  Q4/-1 
Setting  AGFLAG  to  NIL 


Entering  state  Q7/ 

Node  = (((FRED  NPR  (4))  NIL)) 
Taking  PUSH  arc  Q7/-1 
PUSHing  for  NP/ 

Taking  CAT  NPR  arc  NP/-2 
Setting  NPR  to  FRED 
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Knit-ring  state  NP/3 
Node  = (NIL) 

Taking  POP  arc  NP/3-1 
Trying  Lo  POP 

(Continuing  arc  Q7/-1-PUSH) 

Setting  SUBJ  to  (NP  ( NPR  FRED)) 

Entering  state  Q6/ 

Node  = (NIL) 

Taking  POP  arc  Q6/-2 
Trying  to  POP 

(Continuing  arc  Q5/-1-PUSH) 

Setting  OBJ  to  (S  DCL  (NP  (NPR  FRED)) 

( TNS  (PAST  PERFECT)) 
(VP 

(V  SHOOT)  (NP  (NPR  JOHN)))) 


Entering  state  Q6/ 
Node  z (NIL) 

Taking  POP  arc  Q6/-2 
Trying  to  POP 
Trying  to  SUCCEED 


S DCL 

NP  PRO  SOMEONE 
TNS  PAST 
VP  V BELIEVE 
S DCL 

NP  NPR  FRED 
INS  PAST  PERFECT 
VP  V SHOOT 
NP  NPR  JOHN 


One  successful  parse.  Parser  continues 
because  it  was  being  run  in  a r^.-ie  whirh 
returns  all  possible  parses. 


Starting  alternative  13 
At  arc  $4/-2 

Node  = (((BY  PREP  (&))  ((FRED  NPR  4)  NIL))) 
Taking  POP  arc  Q4/-3 
Trying  to  POP 

(Continuing  arc  Q5/-1-PUSH) 

Setting  OBJ  to  (S  DCL  (NP  (PRO  SOMEONE)) 

TNS  (PAST  PERFECT)) 
(VP 

(V  SHOOT)  (NP  (NPR  JOHN) ) ) ) 


Entering  state  Q6/ 

Node  ; (((BY  PREP  (&))  ((FRED  NPR  &)  NIL))) 
Storing  alt  15  for  arc  Q6/-2 
Taking  WRD  BY  arc  Q6 / — 1 
Setting  AGFLAG  to  NIL 

Entering  state  07/ 

Node  r (((FRED  NPR  (4) ) NIL)) 

Taking  PUSH  arc  Q7/-1 
PUSHing  for  NP/ 

Taking  CAI  NPR  arc  NP/-2 
Setting  NPR  to  FRED 

Entering  state  NP/3 
Node  - (NIL) 

Taking  POP  arc  NP/3-1 
Trying  to  POP 

(Continuing  arc  Q7/-1-PUSH) 

Setting  SUBJ  to  (NP  (NPR  FRED)) 
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Entering  state  Qb/ 

Node  = (NIL) 

Taking  POP  arc  Q6/-2 
Trying  t.o  POP 
Trying  to  SUCCEED 

S DCL 

NP  NPR  FRED 
TNS  PAST 

VP  V BELIEVE  Second  possible  parse. 

S DCL 

NP  PRO  SOMEONE 
TNS  PAST  PERFECT 
VP  V SHOOT 

NP  NPR  JOHN 

Starting  alternative  15 
At  arc  Q6/-2 

Node  = (((BY  PREP  (&))  ((FRED  NPR  4)  NIL))) 

Taking  POP  arc  Q6/-2 
Trying  to  POP 
Trying  to  SUCCEED 
BLOCKED 

Starting  alternative  12 
At  arc  Q3/-5 

Node  = (((BY  PREP  (&))  ((FRED  NPR  4)  NIL))) 

BLOCKED 

Starting  alternative  9 
At  arc  Q3/-2 

Node  r (((SHOT  V (&))  ((BY  PREP  4)  (4  NIL)))) 

BLOCKED 

Starting  alternative  8 
At  arc  Q3/-3 

Node  = (((BEEN  V (&))  ((SHOT  V 4)  (4  &)))) 

BLOCKED 

Starting  alternative  6 
At  arc  Q9/-3 

Node  = (((TO  PREP  (4))  ((HAVE  V 4)  (4  4)))) 

Taking  POP  arc  QV-3 
Trying  to  POP 
Trying  to  SUCCEED 
BLOCKED 

Starting  alternative  5 
At  arc  Q3/-5 

Node  = (((TO  PREP  (4))  ((HAVE  V 4)  (4  4)))) 

BLOCKED 

Starting  alternative  2 
At  arc  Q3/-2 

Node  = (((BELIEVED  V (4))  ((TO  PREP  4)  (4  4)))) 

BLOCKED 

NIL 
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Iraminar  Compiler  Declarations 
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Some  features  of  the  reneral  ATN  parser  require  a poo<1  deal  ! 
bookkeep  inp . For  example,  SYSCONJ  requires  a parser  to  save  I he  path  that 
:t  takes  through  the  grammar . This  more  than  doubles  the  amount  of  storage 
v<  . ••  Liev*  Lh«  burden  of  those  features,  such  as  SYSCONJ,  which 

increase  the  overhead,  and  which  a particular  application  may  not  recuire, 
thi  . er  i specify  which  features  his  grammar  uses.  The  compiler  will 
then  tailor  t ne  object  code  to  those  needs.  The  user  specifications 
consist  of  a collection  of  flaps  which  are  set  at  compile  time.  A 
description  of  each  flap  topether  with  its  default  set t inp  is  piven  below. 


HCLDFLG:  It  the  rrammar  does  not  use  the  HOLD  facility,  settinp  tnis  flar 

NIL  will  eliminate  one  field  in  a configuration.  Default  is  T. 

FEATUfiESrLG:  If  t ne  grammar  doesn't  use  the  feature  facility,  settinp  this 

flap  to  NIL  will  eliminate  one  t .eld  in  a conf ipurat  ion . Default  is  t. 

WESTELL:  If  t ne  irrammar  uses  t ne  well-formed  substrinp  feature,  WFCIFLG 

should  be  non-NIL.  Default  is  NIL. 

AL1CATSFLG:  If  tnis  flap  is  NIL,  the  compiler  will  not  compile  the  ability 

to  handle  multiple  interpretations  of  a word  wit  bin  a sinple  cat  egory.  If 
ALTCATSFLG  is  a list,  it  will  compile  tnis  ability  into  those  CAT  arcs 
whose  catepones  are  members  of  the  list  . If  T,  it  will  compile  this 
ability  into  all  CAT  arcs.  Default  is  1. 

LYSCGNJFLG:  If  the  rrammar  uses  the  LUNAR  SYSCONJ  conjunct  ion-hnndl inp 

facility  SYSCONJFLG  should  be  non-NIL.  Default  is  NIL . ( SYSCONJ  ha 

been  .mplemented  yet  . ) 

STARTSTATE:  This  should  be  the  start  state  of  the  crammar.  Default  value 

is  S/ . 

NULLPUSHFLG:  If  NULLFUSHFLG  is  non-NIL,  a RUSH  arc  will  never  be  taken  it 

there  is  no  input  left.  Default  settinp  is  T, 

LNAMBIGUOUS-CHAhT : If  the  input  chart  is  never  ambipuous,  settinp  this 

flap  t.o  a non-ML  value  will  avoid  the  checkinp  for  an  alternative  lexical 
interpretation.  Default  is  NIL. 


This  bep,in3  to  legislate  out  PUSHes  which  do  not  use  any  cf  the 
inputs.  Iri  practical  terms,  tnis  means  that  a PUSHed  to  network  has  to  do 
more  than  just  take  constituents  off  the  hold  list.  In  theoretical  terms, 
it  closes  one  of  the  holes  which  may  allow  an  ATN  prammar  to  be 
undec  ida  ble . 
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larat.i  ns  r Arc  Tests  and  Acti  ms 


The  test:;,  and  actions  on  an  arc  can  be  arbitrary  LISP  expressions.  To 
compiit  these  t unct  ion  calls,  the  grammar  compiler  must  know  which 
argument s get  evaluated.  In  general  the  grammar  compiler  gets  this 
information  from  the  same  declarations  about  functions  that  the  LISP 
compiler  uses  (NLAMA,  NLAML,  FNTYPE,  etc.).  In  addition  a facility  is 
provided  which  allows  the  user  to  tell  the  rrammar  compiler  how  to  compile 
the  individual  arguments  to  particular  functions.  Using  this  facility  it 
is  possible  to  write  function  calls  in  the  rrammar  which  implicitly  QUOTE 
some  of  their  argument s and  evaluate  others  or  even  which  call  another 


function  to  decode  their  arguments.  The  compiler  is  told  how  to  compile 
the  arguments  to  a function  by  putting  a specification  as  the  value  of  the 
property  GhAMMARARGINFO  on  the  property  list  of  the  function  name.  The 

value  of  GRAMMARARGINFO  property  should  be  one  of  the  following: 

1)  LAMBDA:  the  function  evaluates  all  of  its  arguments.  (This  is  the 
default  case.) 

2)  NLAMBDA:  t ne  function  doesn't  evaluate  any  of  its  arguments.  This 
can  also  be  done  by  putting  the  t unct  ion  on  either  of  the  lists  NLAMA 
or  NLAML  (see  1NTERLICP  compiler). 

3)  A list  which  specifies  how  each  argument  should  be  treated.  Each 
element  of  the  list  can  be: 

1)  fc  or  NIL  - This  argument  position  will  be  evaluated.  This  is  the 
usual  case  where  the  act  ion  expects  :t  s argument  to  be  evaluated  and 
tells  the  grammar  compiler  to  tan  th<  irgument.  for  embedded  calls. 

2)  QUOTE  - This  argument  is  embedded  in  QUOTE.  This  provides  a 
convenient  way  of  automatically  auot  ing  certain  argument  positions 
in  a f unct  ion  call. 

3)  * - The  argument  is  not  com; il*  . by  the  grammar  compiler  but  is 
merely  copied.  Note:  Argument  r w.ni'h  occur  in  this  position  should 
not  have  any  embedded  functions  a:  t hese  will  not  be  scanned  bv  the 
compiler . 

U)  Any  other  atom  - The  atom  is  the  name  of  a function  which  when 
APPLYed  to  t ne  argument  returns  t ht  compiled  form. 


Examples : 
could  be 
arc  act  ion 
T).  SETH 


The  grammar  function  SETh  which  sets  the  value  of  a register 

2 

compiled  by  having  a GRAMMARA.  ilia-  property  of  (QUOTE  E)  . The 
( GETR  ANAPHORFLG  T)  would  compile  into  (SETR  (QUOTE  AN APHORFLG ) 
is  defined  a3  a LAMBDA  funet  i (..e.  the  interpreter  evaluates 


t hat 


2 


it 


GETR  is,  in  fact,  recognized 
can  keep  track  of  the  regist 


specially  by  the  grammar 
■rs  which  ire  used  in  the 


compiler 
grammar . 


so 


! 

I 


I 

I 

I 


f 

r 


its  arguments)  which  avoids  the  explicit  call  to  EVAL  whi< h results  from 
having  SETR  be  a NLAMBDA  function  (i.e.  the  interpreter  doesn't  evaluate 
its  argument  s ) . 

In  the  LUNAR  grammar,  many  of  the  arc  functions  use  EVALLOC  to 
evaluate  one  or  more  of  their  arguments.  EVALLOC  has  three  options:  (1) 

if  its  argument  is  or  NIL,  it  gets  the  value  of  the  current  thing  - * ; 

(2)  if  the  argument  is  atomic,  it  is  a register  whose  value  is  retrieved; 
and  (3)  if  the  argument  is  a list,  it  is  evaluated.  This  allows  the 
grammar  to  be  clearer  and  less  cluttered  with  predictable  function  calls. 
To  accomplish  the  same  results  using  the  compiler,  a version  of  EVALLC 
(CEVALLOC)  is  provided  which  returns  the  form  for  the  decoded  argument  . 
The  functions  which  use  it  are  then  riven  GRAMMARARGINFO  property  f 
CEVALLOC  for  those  argument  positions  which  need  decoding.  This  means  that 
the  decoding  process  takes  place  once  at  compile  time  instead  of  each  time 
the  arc  is  tried.  For  example,  in  the  LUNAR  grammar  the  function  MARKER 
has  a GRAMMARARGINFO  property  of  (CEVALLOC  QUOTE).  This  allows  the  grammar 
to  have  (MARKER  N MASS)  as  an  action  which  compiles  .nt  - 
(MARKER  (GE'IR  N)  (QUOTE  MASS))  and  avoids  an  explicit  call  to  EVAL 
MARKER.  Notice  that  by  using  this  technique,  the  grammar  writer  can  easily 
specify  default  arguments  to  actions  in  his  grammar  (at  very  little 
computational  cost)  and  greatly  improve  the  readability  of  the  grammar. 
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Appendix  F 
Debugging  Features 

Since  the  compiler  transforms  the  grammar  into  a program,  the  grammar 
writer  can  use  the  debugging  features  of  the  object  language  to  aid  in 
debugging  his  grammar.  These  should,  of  course,  be  augmented  by  some 
features  particular  to  grammars,  but  these  are  best  integrated  into  an 
existing  framework.  The  following  section  describes  a collection  of 
grammar  debugging  tools  that  have  been  integrated  into  the  INTERLISP 
syst  em . 

The  debugging  facilities  can  be  grouped  into  two  major  categories; 
tracing  and  breaking.  The  trace  will  show  all  grammar  transitions  and 
register-changing  operations.  In  debugging  mode,  the  system  will  even  keep 
a complete  history  of  the  parse  so  that  the  user  can  back  up.  In  addition, 
the  user  has  the  ability  to  stop  the  parser  at  the  end  of  each  line  of  the 

trace  in  order  to  look  around  in  and/or  change  the  current  environment. 

Irac  ing 

The  trace  package  causes  the  functions  in  the  object  language  program 

to  print  out  what  they  are  doing.  There  are  three  types  of  actions  which 

may  be  included  in  the  trace:  (1)  arc  transitions,  storing  of  alternat.v- 

and  hold  list  operations;  (2)  setting  of  registers;  and  (3)  send.'.-  f 
registers  to  a PUSH  configuration.  The  latter  two  of  these  can  be  i • 
off  independently.  In  addition,  the  debugging  system  allow  t« 
trace  to  a disk  file  and  not  to  TTY.  (If  the  user  want'-  b tn  ’■ 
copies,  he  can  use  the  INTERLISP  DRIBbLE  facility.) 

Ireaks 

The  break  package  allows  the  user  to  st  ; 
cneck  the  states  of  the  current  confiirurati  • 
previously  blocked  configurations;  r ti  ► 
parse  to  examine  more  closely  t h*  : «t  > • ••  • 
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BREAK 1 It  he  LISP  Break  executive)  augmented  with  some  special  functions  ann 
bhEAKhACHOS.  Since  t nc  user  is  talking  to  BREAK),  he  can  use  any  of  the 
LISP  break  commands  or  execute  any  LISP  functions  as  well  as  the  special 
commands  described  below.  He  can  also  use  the  special  commands  while 
inside  of  a break  caused  by  having  broken  one  of  his  functions  or  typing 
Control-H  or  Control-B. 

How  to  Get  int  o a break 

Whenever  the  trace  package  prints  a line  cf  tracing  information,  and 
the  variable  PAUSEFLAG  has  a non-NIL  value,  the  trace  package  will  wait  for 
the  user  to  indicate  whether  to  continue  or  break.  A break  is  caused  by 
typing  PAUSECHAH  (initially  Continuation  is  caused  by  typing 
CONTINUECHAR  (initially  All  other  characters  are  ignored.  If 
PAUSECHAH  is  typed,  BREAK  1 is  entered.  The  parsing  is  resumed  by  using  one 
of  the  Break  exiting  commands,  or  by  using  one  of  the  special  commands 
described  below.  Note:  is  equivalent  to  the  Break  command  "OK". 

Grammar  Break  Commands 


Printing  Out  Parsing  Information: 


The  following  commands  and  functions  are  provided  to  print  out 

information  associated  with  a configuration. 

1)  CF  - a Break  command  which  prints  out  the  present  status  of  the 
currently  active  configuration. 

2)  PPCF(n)  - prints  out  the  status  of  configuration  number  n. 

Note:  both  CF  and  PPCF  only  print  non-NIL  information  about  a 

configuration.  Also  PRINTLEVEL  is  set  to  4 when  debugging.  It  can  be 

reset  to  a higher  (or  lower)  number  if  the  user  wants  more  (or  less) 
information  printed. 

3)  PT  - a Break  command  which  tree  prints  (PPTT  - see  below)  the  current 
structure  (*).  This  is  most  useful  after  a POP  to  examine  the 
structure  which  was  POPped. 

4)  PPTT(x)  - prints  the  structure  x in  a tree  format  without  parentheses. 

5)  CFARRAYDUMP( ST  END)  - dumps  the  contents  of  the  configuration  array 
from  configuration  number  ST  to  configuration  number  END.  If  ST  is 
NIL,  0 is  used.  If  END  is  NIL,  the  largest  configuration  FREECF#,  is 
used . 
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Commands  to  Back  up  the  Parser 


The  following  commands  are  used  to  change  the  flow  of  control  of  the 
parser  while  debugging.  In  order  to  use  AGAIN  or  BACKUP,  the  parser  must 
be  run  in  PATH  mode,  which  saves  a new  configuration  each  time  an  arc  is 
taken . 


1)  AGAIN  - a break  command  which  restarts  the  current  configuration, 
i.e.  goes  back  to  the  most  recent  arc  transition  and  starts  again. 
The  effect  is  to  redo  the  current  arc.  If  the  user  discovers  that 
this  did  not  back  up  far  enough,  he  can  use  the  command  BACKUP. 

2)  BACKUP  - a break  command  which  restarts  the  configuration  which  led 
to  the  current  one.  BACKUP  may  be  invoked  successively  to  back  up 
more  than  one  arc  transition. 

3)  ABORT  - a break  command  which  ABORTS  the  current  configuration.  The 
next  active  configuration  will  be  taken  from  the  ALTS  list. 

Note:  AGAIN  and  bACKUP  are  usefui  if  an  arc  is  taken  (or  not  taken)  when 

it  should  not  have  been  (or  should  have  been).  The  predicates  or  functions 

involved  in  the  offending  arc  test  can  be  broken  (using  the  LISP  function 
BREAK)  and  then  AGAIN  or  BACKUP  can  be  called  to  redo  the  arc. 

4)  FIRE(n)  - aborts  the  current  configuration  and  starts  the 
configuration  n.  If  n is  on  the  ALTS  list,  the  ALTS  list  is  POPped 
to  the  configuration  before  n. 

5)  PARSER(n)  - recursively  invokes  the  parser  on  configuration  n.  This 

provides  a way  of  exploring  one  of  the  configurations  on  the  ALTS 

list  or  returning  to  a (much)  earlier  configuration . Note:  After 

PARSER  returns,  the  user  is  still  in  the  same  place  with  respect  to 

the  current  parse  (except  that  he  may  have  fewer  configurations 
left,  his  alternative  lists  may  have'  been  altered  and  his  WFST  may 
contain  more  entries.) 
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Appendix  G 

ATN  Description  of  Part,  of  the  SOPHIE  Semantic  Grammar 


This  appendix  gives  an  ATN  description  of  the  same  subset  of  the 
language  as  presented  in  Appendix  A.  Of  the  24  rules  listed  in  Appendix  A, 
15  became  "syntactic"  categories,  3 were  incorporated  into  other  networks 
and  6 remained  non-terminals.  The  first  section  presents  the  ATN  in  its 
| graphic  form.  The  second  section  presents  the  ATN  as  it  is  input  to  the 

compiler . 

I 

r 

/ - 


cat  pa rt 


9 


Input  Form  of  Semantic  ATN 


(MEASUREMENT/ 

(GROUP 

(CAT  SECTION  T 

(SETR  WHERE  •) 

(TO  MEAS/SECTION ) ) 

( WRD  (INPUT  OUTPUT)  T 
(SETR  I/O  LEX 
(TO  MEAS/I/O) ) 

(CAT  MEAS/QUANT  T 
(SETR  QUANT  •) 

(TO  MEAS/QUANT)) 

(CAT  JUNCTION  T 

(SETR  TERM  *) 

(TO  MEAS/TERM)) 

(CAT  TERM/TYPE  T 
(SETR  TERM  «) 

(TO  MEAS/TERM)) 

(CAT  TRANSFORMER  T 
(SETR  WHERE  *) 

(TO  MEAS/SECTION)))) 

(MEAS/SECTION 

(GROUP 

(WRD  (INPUT  OUTPUT)  T 
(SETR  I/O  •) 

(TO  MEAS/SECT/I/O) ) 
(JUMP  MEAS/SECT/I/O  T))) 

(MEAS/SECT/I/O 

(GROUP 

(CAT  MEAS/QUANT  T 
(SETR  QUANT  •) 

(TO  MEAS/END)) 

(JUMP  MEAS/END  (GETR  I/O)))) 

(MEAS/I/O 

(GROUP 

(CAT  MEAS/QUANT  T 
(SETR  QUANT  •) 

(TO  MEAS/I/O/QUANI )) 
(JUMP  MEAS/I/O/QUANT  T))) 

( ME AS/ I /O/QUANT 
(GROUP 

(CAT  PREP  T 

(TO  MEAS/I/O/QUANT)) 
(CAT  TRANSFORMER  T 
(SETR  WHERE  •) 

(TO  MEAS/END)) 

(CAT  SECTION  T 

(SETR  WHERE  •) 

(TO  MEAS/END) 

(JUMP  MEAS/END  T))) 

(MEAS/TERM 

(CAT  MEAS/QUANT  T 
(SETR  QUANT  •) 

(TO  MEAS/TERM/Q) )) 

(MEAS/TERM/Q 

(GROUP 

(CAT  PREP  T 

(TO  MEAS/TERM/PREP) ) 
(JUMP  MEAS/TERM/PREP  T))) 
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( ME AS/ TERM/ PREP 
(GROUP 

(CAT  PART  T 

(SETH  WHERE  (BUILDQ  (+  •) 

TERM)) 

(TO  MEAS/END)) 

(JUMP  MEAS/END  T 

(SETH  WHERE  ( BUILDQ  (+  (PREP  )) 
TERM 

(PART/RANGE  TERM)))))) 


(MEAS/QUAN1 

(GROUP 

( WHD  (UN  OF)  T 

(SETRO  CLASSES  (PART  TERMINAL  JUNCTION  NODE  SECTION)) 
(TO  MEAS/PREP)) 

( WRD  AT  T 

(SETRQ  CLASSES  (NODE  TERMINAL)) 

(TO  MEAS/PREP)) 

(WRD  (BETWEEN  FROM)  T 
(TO  ME AS/BETWEEN) ) 

(WRD  ACROSS  T 

(SETRO  CLASSES  (PART  JUNCTION)) 

(TO  MEAS/PREP)) 

(WRD  IN  T 

(SETRO  CLASSES  (PART  TERMINAL  JUNCTION  SECTION)) 
(SETRO  I/O  INPUT) 

(TO  MEAS/PREP)) 

(WRD  THROUGH  T 

(SETRO  CLASSES  (PART  TERMINAL  JUNCTION  SECTION)) 

(TO  MEAS/PREP)) 

(WRD  (OUT  FROM)  T 

(SETRQ  CLASSES  (SECTION)) 

(SETRO  I/O  OUTPUT) 

(TO  MEAS/PREP)) 

(JUMP  MEAS/PREP  T)) 

(POP  ( BUILDQ  (REFERENCE  ((QUANT) 

+ )) 

QUANT) 

T)) 


(MEAS/PREP 

(PUSH  CIRCUIT/PLACE/  T 
(SENDRQ  NOPRU  T) 

(SETR  WHERE  *) 

(TO  MEAS/END)) 

(PUSH  JUNCTION/  T 

(SENDRQ  NOPRO  T) 

(SETR  WHERE  *) 

(TO  MEAS/END) ) 

(PUSH  PARI/  1 

(SENDRQ  NOPRO  T) 

(SETR  WHERE  *) 

(TO  MEAS/END)) 

(CAT  SECTION  T 

(SENDRQ  NOPRO  T) 

(SETR  WHERE  *) 

(TO  MEAS/END)) 

(PUSH  PRONOUN/  (GETR  CLASSES) 

(SENDR  TYPES  (GETR  CLASSES)) 
(SETR  WHERE  * ) 

(10  MEAS/END))) 

(MEAS/END 

(POP  (BUILDQ  (MEASURE  ♦ ♦ ♦) 


I 

QUANT  WHERE  I/O) 


l ME AS/ BETWEEN 

(PUSH  PRONOUN/  T 

( SENDRQ  TYPES  (NODE  TERMINAL)) 
(SETR  NODE1  *) 

(TO  MEAS/BET/N 1 ) ) 

(PUSH  CIRCUIT/PLACE/  T 
(SETR  NODE1  *) 

(TO  MEAS/BET/N1)) 

(PUSH  NODE/BET  T 

(SETR  WHERE  *) 

(TO  MEAS/END)) 

( WRD  OUTPUT  T 

(TO  MEAS/BET/OUT) ) ) 

( MEAS/BET/N 1 

(WRD  (TO  AND)  T 

(TO  MEAS/BET/AND) ) ) 

(MEAS/BET/AND 

(PUSH  CIRCUIT/PLACE/  T 
(SETR  N0DE2  *) 

(TO  MEAS/BET/END))) 


(MEAS/BET/END 

(POP  ( BUILDQ  (MEASURE  + + +) 
QUANT  NODE!  N0DE2) 


(CIRCUIT/PLACE/ 

(JUMP  TERMINAL/  T) 

(JUMP  NODE/  T) 

(WRD  THERE  T 

(SETR  POPVAL  (BUILDQ  (PREF  (NODE  TERMINAL)))) 
(TO  POP/VAL/) ) ) 

(NODE/ 

(GROUP 

(WRD  (NODE  N)  T 
(TO  NODE/1)) 

(JUMP  NODE/1  T))) 


"1 

T 

T 


( NODE/  1 
(GROUP 

(WRD  (BETWEEN  JUNCTION)  T 
(TO  NODE/BET)) 

(CAT  NODE  T 

(SETR  NODE  *) 

(TO  NODE/END)) 

(CAT  INTEGER  (AND  ( IGREATERP  * -1) 
(ILESSP  • 27)) 

(SETR  NODE  (PACK  (LIST  (QUOTE  N) 

•))) 

(TO  NODE/END)))) 

(NODE/BET 

(GROUP 

(WRD  OF  T 

(TO  NODE/BET)) 

(CAT  SECTION  T 

(SETR  PARTI  •) 

(TO  NODE/BET/PI))) 

(PUSH  PART/  T 


r 
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(SETH  PARTI  *) 

(TO  NODE/BET/P  1 ) ) ) 

(NODE/BET/PI 
(WHO  AND  T 

(TO  NODE/BET/AND))) 

(NODE/BET/AND 
(PUSH  PART/  T 

(SETR  NODE  ( BUILDQ  (NODE/BETWEEN  ♦ ») 
PARTI ) ) 

(TO  NUDE/END) ) 

(CAT  SECTION  T 

(SETR  NODE  (BUILDQ  (NODE/BETWEEN  + ») 
PARTI ) ) 

(TO  NODE/END) )) 

(NODE/END 

(POP  (GETR  NODE) 

T)) 

(TERMINAL/ 

(GROUP 

(CAT  TERM/ TYPE  T 

(SETR  TERM/TYPE  •) 

(TO  TERM/TYPE)) 

( WHD  ITS  T 

(TO  TERM/ ITS)) 

( QRD  OUTPUT  T 

(TO  TERM/OP)))) 

(TERM/TYPE 

(GROUP 

( iniRD  TERMINAL  T 

(TO  TERM/TYPE/2)) 

(JUMP  TERM/TYPE/2  T))) 


(TERM/PREP 

(PUSH  PART/  T 

(SETR  PART  *) 

(TO  TERM/TERM)) 

( WRD  ONE  T 

(SETR  PART  (BUILDQ  (PREF  ) 
(PART/RANGE  TERM/TYPE))) 

(Tu  TERM/TERM)) 

(JUMP  TERM/ TERM  T 

(SETR  PART  (BUILDQ  (PREF  ) 
(PART/RANGE  TERM/TYPE))))) 

(TERM/TERM 

(POP  (BUILDQ  (♦  ♦) 

TERM/TYPE  PARI) 

T)) 

(TERM/ITS 

(CAT  TERM/TYPE  T 

(SETR  TERM/TYPE  ») 

(TO  TERM/ITS/END))) 

(TERM/ ITS/END 

(POP  (BUILDQ  (♦  (PREF  )) 
TERM/TYPE 

(PART/RANGE  TERM/TYPE)) 

T) ) 


i 

I 

f 


(PART/ 

(GROUP 

(CAT  PART  T 

(SETR  PART  *) 

(TO  PART/END) ) 

( WRD  (0  R D C)  T 
(SETR  TYPE  *) 

(TO  PART/ABBEV)  ) 

(WRD  LOAD  T 

(SETRO  PART  LOAD) 

(TO  PART /END)  ) 

(CAT  SECTION  T 

(SETR  SECTION  *) 

( SETRO  CLASSES  (CAPACITOR  DIODE  RESISTOR  TRANSISTOR 
ZENER/DIODE  TRANSFORMER)) 

(TO  PART/SECTION) ) 

(JUMP  PRONOUN/  T 

(SETRO  TYPES  (PART))))) 

(PART/ABBEV 

(CAT  INTEGER  T 

(SETR  PART  (PACK  (LIST  (GETR  TYPE) 

•))) 

(TO  PART/END))) 

(PART/SECTION 

( TST  RIGHT/TYPE  ( MEMB  LEX  (GETR  CLASSES)) 

(SETR  PART  ( BUILDQ  (FINDPART  + ) 

SECTION  LEX)) 

(TO  PART/END)  )) 


(PART/END 

(POP  (GETR  PART) 
T)) 


(PRONOUN/ 

(GROUP 

(WRD  IT  T 

(TO  PRO/END)) 
(WRD  THAT  T 

(TO  PRO/THAT)))) 


(PRO/THAT 

(TST  TYPE/CHECK  (MEMB  LEX  (GETR  TYPES)) 
(SETR  TYPES  (LIST  LEX)) 

(TO  PRO/END))) 


(PRO/END 

(POP  (BUILDO  (PREF  +) 

TYPES) 

T)) 

(JUNCTION/ 

(GROUP 

(CAT  JUNCTION  T 

(SETR  JUNCTION  *) 

(TO  JUNC/JUNC)) 

(WRD  ITS  ( NULLR  NOPRO) 

(TO  JUNCTION/)))) 

(JUNC/JUNC 

(GROUP 

(WRD  (JUNCTION  CIRCUIT  OF)  T 
(TO  JUNC/JUNC)) 

(JUMP  JUNC/OF  T))) 


r 


n 
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( JUNC/GK 
(GROUP 

(CAT  THANSiSTuR  T 
(SETH  THAN  •) 

(TO  JUNC/END)) 

(JUMP  JUNC/END  ( NULLH  NOPRO) 

(SETH  THAN  (LIST  (QUOTE  PREE  ) 
(QUOTE  (TRANSISTOR) ) ) ) ) ) ) 

(JUNC/END 

(POP  ( bUILDQ  ( + +) 

JUNCTION  THAN) 

T)  ) 
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